You can subscribe to this list here.
2010 |
Jan
|
Feb
|
Mar
|
Apr
(10) |
May
(17) |
Jun
(3) |
Jul
|
Aug
|
Sep
(8) |
Oct
(18) |
Nov
(51) |
Dec
(74) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2011 |
Jan
(47) |
Feb
(44) |
Mar
(44) |
Apr
(102) |
May
(35) |
Jun
(25) |
Jul
(56) |
Aug
(69) |
Sep
(32) |
Oct
(37) |
Nov
(31) |
Dec
(16) |
2012 |
Jan
(34) |
Feb
(127) |
Mar
(218) |
Apr
(252) |
May
(80) |
Jun
(137) |
Jul
(205) |
Aug
(159) |
Sep
(35) |
Oct
(50) |
Nov
(82) |
Dec
(52) |
2013 |
Jan
(107) |
Feb
(159) |
Mar
(118) |
Apr
(163) |
May
(151) |
Jun
(89) |
Jul
(106) |
Aug
(177) |
Sep
(49) |
Oct
(63) |
Nov
(46) |
Dec
(7) |
2014 |
Jan
(65) |
Feb
(128) |
Mar
(40) |
Apr
(11) |
May
(4) |
Jun
(8) |
Jul
(16) |
Aug
(11) |
Sep
(4) |
Oct
(1) |
Nov
(5) |
Dec
(16) |
2015 |
Jan
(5) |
Feb
|
Mar
(2) |
Apr
(5) |
May
(4) |
Jun
(12) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(4) |
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
S | M | T | W | T | F | S |
---|---|---|---|---|---|---|
1
|
2
(1) |
3
(6) |
4
(19) |
5
|
6
(15) |
7
(2) |
8
(2) |
9
(22) |
10
(20) |
11
(20) |
12
(14) |
13
(12) |
14
(2) |
15
|
16
(14) |
17
(17) |
18
(4) |
19
(8) |
20
(2) |
21
(3) |
22
|
23
(8) |
24
(1) |
25
|
26
(2) |
27
(1) |
28
|
29
|
30
(7) |
31
(3) |
|
|
|
|
From: Shankar H. <har...@ya...> - 2012-07-09 21:32:15
|
I ran the same pgbench test after reconfiguring the max_prepared_transactions to 100 (equal to the max_connections configured on either coordinator) just to see how it fares. I have shared findings from the run below. Node 1 - Coord1, Datanode1, gtm-proxy1 Node 2- Coord2, Datanode2, gtm-proxy2 Node 3- Datanode3, gtm This time I definitely did see a spike in numbers (compared to my last run where max_prepared_transactions was @10) but started seeing errors with 10 concurrent connections to Coord1. The errors seen were different though. Test results 111005288 221009012 4410013998 6610017451 8810020450 101010022766 -> 3% bump compared to last run 121210025694 ->24% bump compared to last run 10 clients: Client 9 aborted in state 12: ERROR: GTM error, could not obtain snapshot 12 clients: Client 11 aborted in state 11: ERROR: GTM error, could not obtain snapshot Client 8 aborted in state 11: ERROR: GTM error, could not obtain snapshot 14 clients: The run was left hanging after a few GTM errors. Question, these snapshot errors were seen on all 3 node's consoles. What could cause this error? Is the proxy a bottle neck now due to all load being applied on node 1 instead of being split b/w node 1 and node 2 ? Some more info from the nodes below: node1 - coordinator1 postgres=# select * from pg_stat_activity; datid | datname | procpid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | waiting | current_query -------+----------+---------+----------+----------+------------------+---------------+-----------------+-------------+-------------------------------+----------------------------- --+-------------------------------+---------+--------------------------------- 12804 | postgres | 10016 | 10 | postgres | psql | | | -1 | 2012-06-18 11:17:28.781838-05 | | 2012-07-05 15:28:45.498451-05 | f | <IDLE> 12804 | postgres | 22951 | 10 | postgres | psql | | | -1 | 2012-06-21 03:11:11.030994-05 | 2012-07-08 07:13:51.662961-0 5 | 2012-07-08 07:15:23.654176-05 | f | select * from pg_stat_activity; 12804 | postgres | 22472 | 10 | postgres | | <pgbench client> | | 57249 | 2012-06-21 02:52:29.436629-05 | | 2012-07-08 06:50:27.698791-05 | f | <IDLE> in transaction (aborted) 12804 | postgres | 22475 | 10 | postgres | | <pgbench client> | | 57252 | 2012-06-21 02:52:29.44397-05 | | 2012-07-08 06:50:27.694543-05 | f | <IDLE> in transaction (aborted) (4 rows) node2 - coordinator2 postgres=# select * from pg_stat_activity; datid | datname | procpid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | waiting | current_query -------+----------+---------+----------+----------+------------------+---------------+-----------------+-------------+-------------------------------+- ------------------------------+-------------------------------+---------+--------------------------------- 12804 | postgres | 14601 | 10 | postgres | pgxc | node1 | | 45669 | 2012-07-08 03:17:13.724456-05 | | 2012-07-08 06:24:21.055871-05 | f | <IDLE> 12804 | postgres | 17271 | 10 | postgres | psql | | | -1 | 2012-07-08 07:08:28.768987-05 | 2012-07-08 07:14:04.887459-05 | 2012-07-08 07:15:18.278305-05 | f | select * from pg_stat_activity; (2 rows) All 3 datanodes1/2/3 had 16 idle connections, with about 14 originating from coord1 and just 1 from coord2. This am guessing because all traffic originated from coord1. Is that right? thanks, Shankar ________________________________ From: "pos...@li..." <pos...@li...> To: pos...@li... Sent: Monday, July 9, 2012 11:33 AM Subject: Postgres-xc-developers Digest, Vol 25, Issue 23 Send Postgres-xc-developers mailing list submissions to pos...@li... To subscribe or unsubscribe via the World Wide Web, visit https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers or, via email, send a message with subject or body 'help' to pos...@li... You can reach the person managing the list at pos...@li... When replying, please edit your Subject line so it is more specific than "Re: Contents of Postgres-xc-developers digest..." Today's Topics: 1. Trigger support in XC (pramodh mereddy) 2. Re: Question on gtm-proxy (Shankar Hariharan) ---------------------------------------------------------------------- Message: 1 Date: Mon, 9 Jul 2012 08:32:25 -0500 From: pramodh mereddy <pos...@gm...> Subject: [Postgres-xc-developers] Trigger support in XC To: pos...@li... Message-ID: <CAK...@ma...> Content-Type: text/plain; charset="iso-8859-1" 1) Are triggers fully supported in XC ? 2) Can I setup slony on datanodes to replicate certain tables to different postgres cluster? Pramodh Mereddy -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ Message: 2 Date: Mon, 9 Jul 2012 09:11:53 -0700 (PDT) From: Shankar Hariharan <har...@ya...> Subject: Re: [Postgres-xc-developers] Question on gtm-proxy To: Ashutosh Bapat <ash...@en...> Cc: "pos...@li..." <pos...@li...> Message-ID: <134...@we...> Content-Type: text/plain; charset="iso-8859-1" Thanks Ashutosh. You are right, while running this test i just had pgbench running against one coordinator. Looks like pgbench by itself may not be an apt tool for this kind of testing, I will instead run pgbench's underlying sql script ?from cmdline against either coordinators. ? Thanks for that tip. I got a lot of input on my problem from a lot of folks on the list, the feedback is much appreciated. Thanks everybody! On max_prepared_transactions, I will factor in the number of coordinators and the max_connections on each coordinator while arriving at a figure. ? Will also try out Koichi Suzuki's suggestion to have multiple NICs on the GTM. ?I will post my findings here for the same cluster configuration as before. ? thanks, Shankar ________________________________ From: Ashutosh Bapat <ash...@en...> To: Shankar Hariharan <har...@ya...> Cc: "pos...@li..." <pos...@li...> Sent: Sunday, July 8, 2012 11:02 PM Subject: Re: [Postgres-xc-developers] Question on gtm-proxy Hi Shankar, You have got answers to the prepared transaction problem, I guess. I have something else below. On Sat, Jul 7, 2012 at 1:44 AM, Shankar Hariharan <har...@ya...> wrote: As planned I?ran some tests using PGBench on this setup : > > >Node 1 - Coord1, Datanode1, gtm-proxy1 >Node 2-?Coord2, Datanode2, gtm-proxy2 >Node 3-?Datanode3, gtm > >I was connecting via Coord1 for these tests: >- scale factor of 30 used >- tests run using the following input parameters for pgbench: Try connecting to both the coordinators, it should give you better performance, esp, when you are using distributed tables. With distributed tables, coordinator gets involved in query execution more than that in the case of replicated tables. So, balancing load across two coordinators would help. ? > >ClientsThreadsDurationTransactions >111006204 >221009960 >4410012880 >661001676 ? 8 >8810019758 >101010021944 >121210020674 > > >The run went well until the 8 clients. I started seeing errors on 10 clients onwards and eventually the 14 client run has been hanging around for over an hour now. The errors I have been seeing on console are the following : > > >pgbench console : >Client 8 aborted in state 12: ERROR: ?GTM error, could not obtain snapshot > >Client 0 aborted in state 13: ERROR: ?maximum number of prepared transactions reached >Client 7 aborted in state 13: ERROR: ?maximum number of prepared transactions reached >Client 11 aborted in state 13: ERROR: ?maximum number of prepared transactions reached >Client 9 aborted in state 13: ERROR: ?maximum number of prepared transactions reached > > >node console: >ERROR: ?GTM error, could not obtain snapshot >STATEMENT: ?INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (253, 26, 1888413, -817, CURRENT_TIMESTAMP); >ERROR: ?maximum number of prepared transactions reached >HINT: ?Increase max_prepared_transactions (currently 10). >STATEMENT: ?PREPARE TRANSACTION 'T201428' >ERROR: ?maximum number of prepared transactions reached >STATEMENT: ?END; >ERROR: ?maximum number of prepared transactions reached >STATEMENT: ?END; >ERROR: ?maximum number of prepared transactions reached >STATEMENT: ?END; >ERROR: ?maximum number of prepared transactions reached >STATEMENT: ?END; >ERROR: ?GTM error, could not obtain snapshot >STATEMENT: ?INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (140, 29, 2416403, -4192, CURRENT_TIMESTAMP); > > >I was also watching the processes on each node and see the following for the 14 client run: > > > > >Node1 : >postgres 25571 10511 ?0 04:41 ? ? ? ? ?00:00:02 postgres: postgres postgres ::1(33481) TRUNCATE TABLE waiting >postgres 25620 11694 ?0 04:46 ? ? ? ? ?00:00:00 postgres: postgres postgres pgbench-address (50388) TRUNCATE TABLE > > >Node2: >postgres 10979 ?9631 ?0 Jul05 ? ? ? ? ?00:00:42 postgres: postgres postgres coord1-address(57357) idle in transaction > > > >Node3: > >postgres 20264 ?9911 ?0 08:35 ? ? ? ? ?00:00:05 postgres: postgres postgres? coord1-address(51406) TRUNCATE TABLE waiting > > > > > >I was going to restart the processes on all nodes and start over but did not want to lose this data as it could be useful information. > > >Any explanation on the above issue is much appreciated.?I will try the next run with a higher value set for??max_prepared_transactions. Any recommendations for a good value on this front? > > > >thanks, >Shankar? > > > > > >________________________________ > From: Shankar Hariharan <har...@ya...> >To: Ashutosh Bapat <ash...@en...> >Cc: "pos...@li..." <pos...@li...> >Sent: Friday, July 6, 2012 8:22 AM > >Subject: Re: [Postgres-xc-developers] Question on gtm-proxy > > > >Hi Ashutosh, >I was trying to size the load on a server and was wondering if ?a GTM could be shared w/o much performance overhead between a small number of datanodes and coordinators. I will post my findings here. >thanks, >Shankar > > > >________________________________ > From: Ashutosh Bapat <ash...@en...> >To: Shankar Hariharan <har...@ya...> >Cc: "pos...@li..." <pos...@li...> >Sent: Friday, July 6, 2012 12:25 AM >Subject: Re: [Postgres-xc-developers] Question on gtm-proxy > > >Hi Shankar, >Running gtm-proxy has shown to improve the performance, because it lessens the load on GTM, by serving requests locally. Why do you want the coordinators to connect directly to the GTM? Are you seeing any performance improvement from doing that? > > >On Fri, Jul 6, 2012 at 10:08 AM, Shankar Hariharan <har...@ya...> wrote: > >Follow up to earlier email. In the setup described below, can I avoid using a gtm-proxy? That is, can I just simply point coordinators to the one gtm running on node 3 ? >>My initial plan was to just run the gtm on node 3 then I thought I could try a datanode without a local coordinator which was why I put these two together on node 3. >>thanks, >>Shankar >> >> >> >>________________________________ >> From: Shankar Hariharan <har...@ya...> >>To: "pos...@li..." <pos...@li...> >>Sent: Thursday, July 5, 2012 11:35 PM >>Subject: Question on multiple coordinators >> >> >>Hello, >> >> >>Am trying out XC 1.0 in the following configuraiton. >>Node 1 - Coord1, Datanode1, gtm-proxy1 >>Node 2-?Coord2, Datanode2, gtm-proxy2 >>Node 3-?Datanode3, gtm >> >> >>I setup all nodes but forgot to add Coord1 to?Coord2 and vice versa. In addition I missed the pg_hba edit as well.?So the first table T1 that I created for distribution from?Coord1?was not "visible| from?Coord2 but was on all the data nodes.? >>I tried to get Coord2 backinto business in various ways but the first table I created refused to show up on Coord2 : >>- edit pg_hba and add node on both coord1 and 2. Then run?select pgxc_pool_reload(); >>- restart coord 1 and 2 >>- drop node c2 from c1 and c1 from c2 and add them back followed by?select pgxc_pool_reload(); >> >> >>So I tried to create the same table T1 from?Coord2 to observe behavior and it did not like it clearly as all nodes it "wrote" to reported that the table already existed which was good. At this point I could understand that Coord2 and Coord1 are not talking alright so I created a new table from coord1 with replication. This table was visible from both now.? >> >> >>Question is should I expect to see the first table, let me call it T1 after a while from Coord2 also?? >> >> >> >> >>thanks, >>Shankar >> >> >>------------------------------------------------------------------------------ >>Live Security Virtual Conference >>Exclusive live event will cover all the ways today's security and >>threat landscape has changed and how IT managers can respond. Discussions >>will include endpoint security, mobile security and the latest in malware >>threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>_______________________________________________ >>Postgres-xc-developers mailing list >>Pos...@li... >>https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >> > > >-- >Best Wishes, >Ashutosh Bapat >EntepriseDB Corporation >The Enterprise Postgres Company > > > > > > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ------------------------------ _______________________________________________ Postgres-xc-developers mailing list Pos...@li... https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers End of Postgres-xc-developers Digest, Vol 25, Issue 23 ****************************************************** |
From: Nikhil S. <ni...@st...> - 2012-07-09 20:50:26
|
>>>> There are many trailing white spaces in the patch. Please fix those, >>>> they unnecessarily fail the automatic merges sometimes. You can do that when >>>> you commit the patch. >>> >>> Oh OK, I didn't notice. Do you have some places particularly in mind? >> >> >> Apply your patch on clean repository using git apply and it will show you. > > I'll make a test. > In your .gitconfig in your home dir, if you set the below, it will typically show the trailing blanks in red [color] ui = auto Pretty useful before submitting a patch. There are other options like -b, -w or --ignore-space-at-eol that should be able to help, but am happy with the ui setting above. HTH, Nikhils >> >> >>> >>> >>>> >>>> >>>> Code >>>> ==== >>>> 1. There is a lot of code, which is refactoring existing code, renaming >>>> functions, which is not necessarily related to redistribution work. Can you >>>> please provide separate patches for this refactoring? We should commit them >>>> separately. For example build_subcluster_data() has been renamed (for good >>>> may be), but it makes sense if we do it separately. Someone looking at the >>>> ALTER TABLE commit should not get overwhelmed by the extraneous changes. >>> >>> OK. The problem with the functions currently on master was that there >>> name was not really generic and sometimes did not reflect their real >>> functionality. So as now the plan is to use them in a more general way, I >>> think there name is not going to change anymore. >>> >>>> >>>> >>>> 2. Same is the case with the grammar changes. Please separate the >>>> grammar changes related to pgxc_nodelist etc. into separate patch, although >>>> it's because of ALTER TABLE you need to do those changes. >>> >>> OK understood. >>> >>>> >>>> >>>> Please get these patches reviewed as well, since I haven't looked at the >>>> changes proper. >>> >>> Understood, I'll make those 2 patches on tomorrow morning, not a big >>> deal. >>> >>>> >>>> >>>> Tests >>>> ===== >>>> 1. There is no need to test with huge data, that slows down regression. >>>> For performance testing, you can create a separate test (not to be included >>>> in regression), if you want. >>> >>> That may be an idea. However you are right I'll limit the number of rows >>> tested. >>> >>> >>>> >>>> 2. We need tests, which will test the plan cache (in)validation upon >>>> redistribution of data, tests for testing existing views working after the >>>> redistribution. Please take a look at the PG alter table test for more such >>>> scenarios. >>> >>> OK I'll add those scenarios. They will be included in xc_alter_table. >>> >>>> >>>> If you happen to add some performance tests, it would be also good to >>>> test the sanity of concurrent transactions accessing the object/s being >>>> redistributed. It's vital considering that such redistribution would run for >>>> longer. >>> >>> Yes, it would be nice to >>> >>> >>>> >>>> 3. Instead of relying on count(*) to show sanity of the redistributed >>>> data, you may use better aggregates like array_agg or sum(), avg() and >>>> count(). I would prefer array_agg over others, since you can list all the >>>> data values there. You will need aggregate's order by clause (Not that of >>>> the SELECT). >>>> 4. In the case of redistribution of table with index, you will need to >>>> check the sanity of index after the redistribution by some means. >>> >>> Do you have an idea of how to do that? Pick up some tests from postgres? >> >> >> Good question. But I don't have an answer (specifically for XC, since the >> indexes are on datanodes). > > OK I'll figure out myself smth. > >> >> >>> >>> >>>> >>>> 5. I did not understand the significance of the tests where you add and >>>> drop column and redistribute the data. The SELECT after the redistribution >>>> is not testing anything specific for the added/dropped column. >>> >>> The internal, let's say default layer, of distribution mechanism uses an >>> internal COPY and it is important to do this check and correctly bypass the >>> columns that are dropped. The SELECT is just here to check that data has >>> been redistributed correctly. >>> >>> >>>> >>>> 6. There are no testcases which would change the distribution type and >>>> node list at the same time. Please add those. (I am assuming that these two >>>> operations are possible together). >>> >>> Yeah sorry, I have been working on that today and added some additional >>> tests that can do that. >>> They are in the bucket, just I didn't send the absolutely latest version. >>> >>>> >>>> 7. Negative testcases need to improved. >>> >>> What are the negative test cases? It would be cool if you could precise. >> >> >> Tests which do negative testing >> (http://www.sqatester.com/methodology/PositiveandNegativeTesting.htm) > > So you mean that I need to reformat and refactor my test cases??! > >> >> >>> >>> >>>> >>>> >>>> Additional feature >>>> ================== >>>> It will be helpful to add the distribution information in the output of >>>> \d command for tables. It will be good tool for tests to check whether the >>>> catalogs have been updated correctly or not. Please add this feature before >>>> we complete ALTER TABLE. It shouldn't take much time. Please provide this as >>>> a separate patch. >>> >>> +1. >>> This is a good idea, and I recall we had this discussion a couple of >>> months ago. However it is not directly related with redistribution. So it >>> should be provided after committing the redistribution work I believe. >> >> >> It will help in testing the feature. For example, you can just do \d on >> the redistributed table, to see if catalogs have been updated correctly or >> not. So, it's better to do it before this ALTER TABLE, so that you can use >> it in the tests. It should been done when the work related to the subcluster >> was done, even before when XC was started :). Anyway, earlier the better. >> >>> >>> Also, I think we shouldn't use ¥d as it will impact other applications >>> like pgadmin for instance. We should use an extension of ¥d like for example >>> ¥dZ. This is just a suggestion, I don't know what are the commands still not >>> in use. >> >> >> \d is for describing a relation at bare minimum. In XC distribution >> strategy becomes an integral part of a relation, and thus should be part of >> the \d output. Applications using \d will need a change, but how many >> applications connect via psql to fire commands (very less, I guess), so we >> are not in much trouble. > > We are not sure about that, honestly! Just for security's sake, we should > use another command. It also impacts XC transparency with postgres. I also > believe it is an add-on, which is not part of the redistribution core. > -- > Michael Paquier > http://michael.otacoo.com > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > -- StormDB - http://www.stormdb.com The Database Cloud |
From: Abbas B. <abb...@en...> - 2012-07-09 20:04:11
|
Here is the update on the issue. It was decided that the changes done by the data nodes in the command id should be communicated back to the coordinator and that the coordinator should choose the largest of all the received values as the next command id. It was suggested that we should check that a skipped value of command id should not create a problem for the next operations on the table. I have verified both by studying the code and by actually changing the function CommandCounterIncrement to increment the command id by 3 and running regression. It worked fine so a hole in command id is not a problem. Next it was suggested that we should use the mechanism currently in place to send # of tuples affected by a statement to communicate the changed command id to the coordinator. Please refer to this link in the documentation http://www.postgresql.org/docs/9.1/static/protocol-message-formats.html Note that there is no such message format that exists in the current over the wire protocol to communicate the # of tuples affected by a statement. The libpq functions that we might suspect do the same are PQntuples and PQcmdTuples. PQntuples simply returns ntups member of PGresult, where as PQcmdTuples extracts the # of tuples affected from the CommandComplete 'C' message string. We cannot use these mechanisms for our purpose. I evaluated the use of NoticeResponse 'N' for sending changed command id but the message format of NoticeResponse mandates the use of certain fields which will make our sent messages un-necessarily bulky and would consume network bandwidth for no reason. I therefore suggest the we use a new message for communicating XC specific information from data node to coordinator. Currently we will use it for command id but we will design the message format flexible enough to accommodate future XC requirements. Whenever the data node increments command id we will send the information to the coordinator and handle_response function in execRemote.c would be changed to accommodate new message. Since coordinators will never use the new message therefore the existing clients do not need to bother. Comments or suggestions are welcome. Regards On Wed, Jul 4, 2012 at 8:35 AM, Abbas Butt <abb...@en...>wrote: > While fixing the regression failures resulting from the changes done by > the patch I was able to fix all except this test case > > set enforce_two_phase_commit = off; > > CREATE TEMP TABLE users ( > id INT PRIMARY KEY, > name VARCHAR NOT NULL > ) DISTRIBUTE BY REPLICATION; > > INSERT INTO users VALUES (1, 'Jozko'); > INSERT INTO users VALUES (2, 'Ferko'); > INSERT INTO users VALUES (3, 'Samko'); > CREATE TEMP TABLE tasks ( > id INT PRIMARY KEY, > owner INT REFERENCES users ON UPDATE CASCADE ON DELETE SET NULL, > worker INT REFERENCES users ON UPDATE CASCADE ON DELETE SET NULL, > checked_by INT REFERENCES users ON UPDATE CASCADE ON DELETE SET NULL > ) DISTRIBUTE BY REPLICATION; > > INSERT INTO tasks VALUES (1,1,NULL,NULL); > INSERT INTO tasks VALUES (2,2,2,NULL); > INSERT INTO tasks VALUES (3,3,3,3); > > BEGIN; > > UPDATE tasks set id=id WHERE id=2; > SELECT * FROM tasks; > > DELETE FROM users WHERE id = 2; > SELECT * FROM tasks; > > COMMIT; > > The obtained output from the last select statement is > > id | owner | worker | checked_by > ----+-------+--------+------------ > 1 | 1 | | > 3 | 3 | 3 | 3 > 2 | 2 | 2 | > (3 rows) > > where as the expected output is > > id | owner | worker | checked_by > ----+-------+--------+------------ > 1 | 1 | | > 3 | 3 | 3 | 3 > 2 | | | > (3 rows) > > Note that the owner and worker have been set to null due to "ON DELETE SET > NULL". > > Here is the reason why this does not work properly. Consider the last > transaction > > BEGIN; > > UPDATE tasks set id=id WHERE id=2; > SELECT * FROM tasks; > > DELETE FROM users WHERE id = 2; > SELECT * FROM tasks; > > COMMIT; > > Here are the command id values the coordinator sends to the data node > > 0 for the first update that gets incremented to 1 because this is a DML > and needs to consume a command id > 1 for the first select that remains 1 since it is not required to be > consumed. > 1 for the delete statement that gets incremented to 2 because it is a DML > and 2 for the last select. > > Now this is what happens on the data node > > When the data node receives the first update with command id 0, it > increments it once due to the update itself and once due to the update run > because of "ON UPDATE CASCADE". Hence the command id at the end of update > on data node is 2. > The first select comes to data node with command id 1, which is incorrect. > The user's intention is to see data after update and its command id should > be 2. > Now delete comes with command id 1, and data node increments it once due > to the delete itself and once due to the update run because of "ON DELETE > SET NULL", hence the command id at the end of delete is 3. > Coordinator now sends last select with command id 2, which is again > incorrect since user's intention is to see data after delete and select > should have been sent to data node with command id 3 or 4. > > Every time data node increments command id due to any statements run > implicitly either because of the constraints or triggers, this scheme of > sending command ids to data node from coordinator to solve fetch problems > would fail. > > Datanode can have a trigger e.g. inserting rows thrice on every single > insert and would increment command id on every insert. Therefore this > design cannot work. > > Either we have to synchronize command ids between datanode and coordinator > through GTM > OR > We will have to send the DECLARE CURSOR down to the datanode. In this case > however we will not be able to send the cursor query as it is because the > query might contain a join on two tables which exist on a disjoint set of > data nodes. > > Comments or suggestions are welcome. > > > > On Tue, Jun 19, 2012 at 2:43 PM, Abbas Butt <abb...@en...>wrote: > >> Thanks for your comments. >> >> On Tue, Jun 19, 2012 at 1:54 PM, Ashutosh Bapat < >> ash...@en...> wrote: >> >>> Hi Abbas, >>> I have few comments to make >>> 1. With this patch there are two variables for having command Id, that >>> is going to cause confusion and will be a maintenance burden, might be >>> error prone. Is it possible to use a single variable instead of two? >> >> >> Are you talking about receivedCommandId and currentCommandId? If yes, I >> would prefer not having a packet received from coordinator overwrite the >> currentCommandId at data node, because I am not 100% sure about the life >> time of currentCommandId, I might overwrite it before time. It would be >> safe to let currentCommandId as is unless we are compelled to get the next >> command ID, and have the received command id take priority at that time. >> >> >>> Right now there is some code which is specific to cursors in your patch. >>> If you can plug the coordinator command id somehow into currentCommandId, >>> you won't need that code and any other code which needs coordinator command >>> ID will be automatically taken care of. >>> >> >> That code is required to solve a problem. Consider this case when a >> coordinator received this transaction >> >> >> BEGIN; >> insert into tt1 values(1); >> declare c50 cursor for select * from tt1; >> insert into tt1 values(2); >> fetch all from c50; >> COMMIT; >> >> While sending select to the data node in response to a fetch we need to >> know what was the command ID of the declare cursor statement and we need to >> send that command ID to the data node for this particular fetch. This is >> the main idea behind this solution. >> >> The first insert goes to the data node with command id 0, the second >> insert goes with 2. Command ID 1 is consumed by declare cursor. When >> coordinator sees fetch it needs to send select to the data node with >> command ID 1 rather than 3. >> >> >> >>> 2. A non-transaction on coordinator can spawn tranasactions on datanode >>> or subtransactions (if there is already a transaction running). Does your >>> patch handle that case? >> >> >> No and it does not need to, because that case has no known problems that >> we need to solve. I don't think my patch would impact any such case but I >> will analyze any failures that I may get in regressions. >> >> >>> Should we do more thorough research in the transaction management, esp. >>> to see the impact of getting same command id for two commands on the >>> datanode? >>> >> >> If we issue two commands with the same command ID then we will definitely >> have visibility issues according to the rules I have already explained. But >> we will not have two commands sent to the data node with same command id. >> >> >>> >>> >>> On Tue, Jun 19, 2012 at 1:56 PM, Abbas Butt <abb...@en... >>> > wrote: >>> >>>> Hi Ashutosh, >>>> Here are the results with the val column, Thanks. >>>> >>>> test=# drop table mvcc_demo; >>>> DROP TABLE >>>> test=# >>>> test=# create table mvcc_demo (val int); >>>> CREATE TABLE >>>> test=# >>>> test=# TRUNCATE mvcc_demo; >>>> TRUNCATE TABLE >>>> test=# >>>> test=# BEGIN; >>>> BEGIN >>>> test=# DELETE FROM mvcc_demo; -- increment command id to show that >>>> combo id would be different >>>> DELETE 0 >>>> test=# DELETE FROM mvcc_demo; >>>> DELETE 0 >>>> test=# DELETE FROM mvcc_demo; >>>> DELETE 0 >>>> test=# INSERT INTO mvcc_demo VALUES (1); >>>> INSERT 0 1 >>>> test=# INSERT INTO mvcc_demo VALUES (2); >>>> INSERT 0 1 >>>> test=# INSERT INTO mvcc_demo VALUES (3); >>>> INSERT 0 1 >>>> test=# SELECT t_xmin AS xmin, >>>> test-# t_xmax::text::int8 AS xmax, >>>> test-# t_field3::text::int8 AS cmin_cmax, >>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>> is_combocid >>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>> test-# ORDER BY 2 DESC, 3; >>>> xmin | xmax | cmin_cmax | is_combocid >>>> -------+------+-----------+------------- >>>> 80689 | 0 | 3 | f >>>> 80689 | 0 | 4 | f >>>> 80689 | 0 | 5 | f >>>> (3 rows) >>>> >>>> test=# >>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>>> xmin | xmax | cmin | cmax | val >>>> -------+------+------+------+----- >>>> 80689 | 0 | 3 | 3 | 1 >>>> 80689 | 0 | 4 | 4 | 2 >>>> 80689 | 0 | 5 | 5 | 3 >>>> >>>> (3 rows) >>>> >>>> test=# >>>> test=# DELETE FROM mvcc_demo; >>>> DELETE 3 >>>> test=# SELECT t_xmin AS xmin, >>>> test-# t_xmax::text::int8 AS xmax, >>>> test-# t_field3::text::int8 AS cmin_cmax, >>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>> is_combocid >>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>> test-# ORDER BY 2 DESC, 3; >>>> xmin | xmax | cmin_cmax | is_combocid >>>> -------+-------+-----------+------------- >>>> 80689 | 80689 | 0 | t >>>> 80689 | 80689 | 1 | t >>>> 80689 | 80689 | 2 | t >>>> (3 rows) >>>> >>>> test=# >>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>>> xmin | xmax | cmin | cmax | val >>>> ------+------+------+------+----- >>>> (0 rows) >>>> >>>> >>>> test=# >>>> test=# END; >>>> COMMIT >>>> test=# >>>> test=# >>>> test=# TRUNCATE mvcc_demo; >>>> TRUNCATE TABLE >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> test=# BEGIN; >>>> BEGIN >>>> test=# INSERT INTO mvcc_demo VALUES (1); >>>> INSERT 0 1 >>>> test=# INSERT INTO mvcc_demo VALUES (2); >>>> INSERT 0 1 >>>> test=# INSERT INTO mvcc_demo VALUES (3); >>>> INSERT 0 1 >>>> test=# SELECT t_xmin AS xmin, >>>> test-# t_xmax::text::int8 AS xmax, >>>> test-# t_field3::text::int8 AS cmin_cmax, >>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>> is_combocid >>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>> test-# ORDER BY 2 DESC, 3; >>>> xmin | xmax | cmin_cmax | is_combocid >>>> -------+------+-----------+------------- >>>> 80693 | 0 | 0 | f >>>> 80693 | 0 | 1 | f >>>> 80693 | 0 | 2 | f >>>> (3 rows) >>>> >>>> test=# >>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>>> xmin | xmax | cmin | cmax | val >>>> -------+------+------+------+----- >>>> 80693 | 0 | 0 | 0 | 1 >>>> 80693 | 0 | 1 | 1 | 2 >>>> 80693 | 0 | 2 | 2 | 3 >>>> (3 rows) >>>> >>>> test=# >>>> test=# UPDATE mvcc_demo SET val = 10; >>>> >>>> UPDATE 3 >>>> test=# >>>> test=# SELECT t_xmin AS xmin, >>>> test-# t_xmax::text::int8 AS xmax, >>>> test-# t_field3::text::int8 AS cmin_cmax, >>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>> is_combocid >>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>> test-# ORDER BY 2 DESC, 3; >>>> xmin | xmax | cmin_cmax | is_combocid >>>> -------+-------+-----------+------------- >>>> 80693 | 80693 | 0 | t >>>> 80693 | 80693 | 1 | t >>>> 80693 | 80693 | 2 | t >>>> 80693 | 0 | 3 | f >>>> 80693 | 0 | 3 | f >>>> 80693 | 0 | 3 | f >>>> (6 rows) >>>> >>>> test=# >>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>>> xmin | xmax | cmin | cmax | val >>>> -------+------+------+------+----- >>>> 80693 | 0 | 3 | 3 | 10 >>>> 80693 | 0 | 3 | 3 | 10 >>>> 80693 | 0 | 3 | 3 | 10 >>>> (3 rows) >>>> >>>> >>>> test=# >>>> test=# END; >>>> COMMIT >>>> test=# >>>> test=# TRUNCATE mvcc_demo; >>>> TRUNCATE TABLE >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> -- From one psql issue >>>> test=# INSERT INTO mvcc_demo VALUES (1); >>>> INSERT 0 1 >>>> test=# SELECT t_xmin AS xmin, >>>> test-# t_xmax::text::int8 AS xmax, >>>> test-# t_field3::text::int8 AS cmin_cmax, >>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>> is_combocid >>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>> test-# ORDER BY 2 DESC, 3; >>>> xmin | xmax | cmin_cmax | is_combocid >>>> -------+------+-----------+------------- >>>> 80699 | 0 | 0 | f >>>> (1 row) >>>> >>>> test=# >>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>>> xmin | xmax | cmin | cmax | val >>>> -------+------+------+------+----- >>>> 80699 | 0 | 0 | 0 | 1 >>>> (1 row) >>>> >>>> >>>> >>>> >>>> >>>> test=# -- From another issue >>>> test=# BEGIN; >>>> BEGIN >>>> test=# INSERT INTO mvcc_demo VALUES (2); >>>> INSERT 0 1 >>>> test=# INSERT INTO mvcc_demo VALUES (3); >>>> INSERT 0 1 >>>> test=# INSERT INTO mvcc_demo VALUES (4); >>>> INSERT 0 1 >>>> test=# SELECT t_xmin AS xmin, >>>> test-# t_xmax::text::int8 AS xmax, >>>> test-# t_field3::text::int8 AS cmin_cmax, >>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>> is_combocid >>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>> test-# ORDER BY 2 DESC, 3; >>>> xmin | xmax | cmin_cmax | is_combocid >>>> -------+------+-----------+------------- >>>> 80699 | 0 | 0 | f >>>> 80700 | 0 | 0 | f >>>> 80700 | 0 | 1 | f >>>> 80700 | 0 | 2 | f >>>> (4 rows) >>>> >>>> test=# >>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>>> xmin | xmax | cmin | cmax | val >>>> -------+------+------+------+----- >>>> 80699 | 0 | 0 | 0 | 1 >>>> 80700 | 0 | 0 | 0 | 2 >>>> 80700 | 0 | 1 | 1 | 3 >>>> 80700 | 0 | 2 | 2 | 4 >>>> (4 rows) >>>> >>>> test=# >>>> test=# UPDATE mvcc_demo SET val = 10; >>>> >>>> UPDATE 4 >>>> test=# >>>> test=# SELECT t_xmin AS xmin, >>>> test-# t_xmax::text::int8 AS xmax, >>>> test-# t_field3::text::int8 AS cmin_cmax, >>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>> is_combocid >>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>> test-# ORDER BY 2 DESC, 3; >>>> xmin | xmax | cmin_cmax | is_combocid >>>> -------+-------+-----------+------------- >>>> 80700 | 80700 | 0 | t >>>> 80700 | 80700 | 1 | t >>>> 80700 | 80700 | 2 | t >>>> 80699 | 80700 | 3 | f >>>> 80700 | 0 | 3 | f >>>> 80700 | 0 | 3 | f >>>> 80700 | 0 | 3 | f >>>> 80700 | 0 | 3 | f >>>> (8 rows) >>>> >>>> test=# >>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>>> xmin | xmax | cmin | cmax | val >>>> -------+------+------+------+----- >>>> 80700 | 0 | 3 | 3 | 10 >>>> 80700 | 0 | 3 | 3 | 10 >>>> 80700 | 0 | 3 | 3 | 10 >>>> 80700 | 0 | 3 | 3 | 10 >>>> (4 rows) >>>> >>>> >>>> >>>> >>>> test=# -- Before finishing this, issue these from the first psql >>>> test=# SELECT t_xmin AS xmin, >>>> test-# t_xmax::text::int8 AS xmax, >>>> test-# t_field3::text::int8 AS cmin_cmax, >>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>> is_combocid >>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>> test-# ORDER BY 2 DESC, 3; >>>> xmin | xmax | cmin_cmax | is_combocid >>>> -------+-------+-----------+------------- >>>> 80700 | 80700 | 0 | t >>>> 80700 | 80700 | 1 | t >>>> 80700 | 80700 | 2 | t >>>> 80699 | 80700 | 3 | f >>>> 80700 | 0 | 3 | f >>>> 80700 | 0 | 3 | f >>>> 80700 | 0 | 3 | f >>>> 80700 | 0 | 3 | f >>>> (8 rows) >>>> >>>> test=# >>>> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >>>> xmin | xmax | cmin | cmax | val >>>> -------+-------+------+------+----- >>>> 80699 | 80700 | 3 | 3 | 1 >>>> (1 row) >>>> >>>> test=# end; >>>> COMMIT >>>> >>>> >>>> On Tue, Jun 19, 2012 at 10:26 AM, Michael Paquier < >>>> mic...@gm...> wrote: >>>> >>>>> Hi, >>>>> >>>>> I expect pgxc_node_send_cmd_id to have some impact on performance, so >>>>> be sure to send it to remote Datanodes really only if necessary. >>>>> You should put more severe conditions blocking this function cid can >>>>> easily get incremented in Postgres. >>>>> >>>>> Regards, >>>>> >>>>> On Tue, Jun 19, 2012 at 5:31 AM, Abbas Butt < >>>>> abb...@en...> wrote: >>>>> >>>>>> PFA a WIP patch implementing the design presented earlier. >>>>>> The patch is WIP because it still has and FIXME and it shows some >>>>>> regression failures that need to be fixed, but other than that it confirms >>>>>> that the suggested design would work fine. The following test cases now >>>>>> work fine >>>>>> >>>>>> drop table tt1; >>>>>> create table tt1(f1 int) distribute by replication; >>>>>> >>>>>> >>>>>> BEGIN; >>>>>> insert into tt1 values(1); >>>>>> declare c50 cursor for select * from tt1; >>>>>> insert into tt1 values(2); >>>>>> fetch all from c50; >>>>>> COMMIT; >>>>>> truncate table tt1; >>>>>> >>>>>> BEGIN; >>>>>> >>>>>> declare c50 cursor for select * from tt1; >>>>>> insert into tt1 values(1); >>>>>> >>>>>> insert into tt1 values(2); >>>>>> fetch all from c50; >>>>>> COMMIT; >>>>>> truncate table tt1; >>>>>> >>>>>> >>>>>> BEGIN; >>>>>> insert into tt1 values(1); >>>>>> insert into tt1 values(2); >>>>>> >>>>>> declare c50 cursor for select * from tt1; >>>>>> insert into tt1 values(3); >>>>>> >>>>>> fetch all from c50; >>>>>> COMMIT; >>>>>> truncate table tt1; >>>>>> >>>>>> >>>>>> BEGIN; >>>>>> insert into tt1 values(1); >>>>>> declare c50 cursor for select * from tt1; >>>>>> insert into tt1 values(2); >>>>>> declare c51 cursor for select * from tt1; >>>>>> insert into tt1 values(3); >>>>>> fetch all from c50; >>>>>> fetch all from c51; >>>>>> COMMIT; >>>>>> truncate table tt1; >>>>>> >>>>>> >>>>>> BEGIN; >>>>>> insert into tt1 values(1); >>>>>> declare c50 cursor for select * from tt1; >>>>>> declare c51 cursor for select * from tt1; >>>>>> insert into tt1 values(2); >>>>>> insert into tt1 values(3); >>>>>> fetch all from c50; >>>>>> fetch all from c51; >>>>>> COMMIT; >>>>>> truncate table tt1; >>>>>> >>>>>> >>>>>> On Fri, Jun 15, 2012 at 8:07 AM, Abbas Butt < >>>>>> abb...@en...> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> In a multi-statement transaction each statement is given a command >>>>>>> identifier >>>>>>> starting from zero and incrementing for each statement. >>>>>>> These command indentifers are required for extra tracking because >>>>>>> each >>>>>>> statement has its own visibility rules with in the transaction. >>>>>>> For example, a cursor’s contents must remain unchanged even if later >>>>>>> statements in the >>>>>>> same transaction modify rows. Such tracking is implemented using >>>>>>> system command id >>>>>>> columns cmin/cmax, which is internally actually is a single column. >>>>>>> >>>>>>> cmin/cmax come into play in case of multi-statement transactions >>>>>>> only, >>>>>>> they are both zero otherwise. >>>>>>> >>>>>>> cmin "The command identifier of the statement within the inserting >>>>>>> transaction." >>>>>>> cmax "The command identifier of the statement within the deleting >>>>>>> transaction." >>>>>>> >>>>>>> Here are the visibility rules (taken from comments of tqual.c) >>>>>>> >>>>>>> ( // A heap tuple is valid >>>>>>> "now" iff >>>>>>> Xmin == my-transaction && // inserted by the current >>>>>>> transaction >>>>>>> Cmin < my-command && // before this command, and >>>>>>> ( >>>>>>> Xmax is null || // the row has not been >>>>>>> deleted, or >>>>>>> ( >>>>>>> Xmax == my-transaction && // it was deleted by the >>>>>>> current transaction >>>>>>> Cmax >= my-command // but not before this >>>>>>> command, >>>>>>> ) >>>>>>> ) >>>>>>> ) >>>>>>> || // or >>>>>>> ( >>>>>>> Xmin is committed && // the row was inserted by >>>>>>> a committed transaction, and >>>>>>> ( >>>>>>> Xmax is null || // the row has not been >>>>>>> deleted, or >>>>>>> ( >>>>>>> Xmax == my-transaction && // the row is being deleted >>>>>>> by this transaction >>>>>>> Cmax >= my-command) || // but it's not deleted >>>>>>> "yet", or >>>>>>> ( >>>>>>> Xmax != my-transaction && // the row was deleted by >>>>>>> another transaction >>>>>>> Xmax is not committed // that has not been >>>>>>> committed >>>>>>> ) >>>>>>> ) >>>>>>> ) >>>>>>> ) >>>>>>> >>>>>>> Because cmin and cmax are internally a single system column, >>>>>>> it is therefore not possible to simply record the status of a row >>>>>>> that is created and expired in the same multi-statement transaction. >>>>>>> For that reason, a special combo command id is created that >>>>>>> references >>>>>>> a local memory hash that contains the actual cmin and cmax values. >>>>>>> It means that if combo id is being used the number we are seeing >>>>>>> would not be the cmin or cmax it will be an index into a local >>>>>>> array that contains a structure with has the actual cmin and cmax >>>>>>> values. >>>>>>> >>>>>>> The following queries (taken mostly from >>>>>>> http://momjian.us/main/writings/pgsql/mvcc.pdf) >>>>>>> use the contrib module pageinspect, which allows >>>>>>> visibility of internal heap page structures and all stored rows, >>>>>>> including those not visible in the current snapshot. >>>>>>> (Bit 0x0020 is defined as HEAP_COMBOCID.) >>>>>>> >>>>>>> We are exploring 3 examples here: >>>>>>> 1) INSERT & DELETE in a single transaction >>>>>>> 2) INSERT & UPDATE in a single transaction >>>>>>> 3) INSERT from two different transactions & UPDATE from one >>>>>>> >>>>>>> test=# drop table mvcc_demo; >>>>>>> DROP TABLE >>>>>>> test=# >>>>>>> test=# create table mvcc_demo (val int); >>>>>>> CREATE TABLE >>>>>>> test=# >>>>>>> test=# TRUNCATE mvcc_demo; >>>>>>> TRUNCATE TABLE >>>>>>> test=# >>>>>>> test=# BEGIN; >>>>>>> BEGIN >>>>>>> test=# DELETE FROM mvcc_demo; -- increment command id to show that >>>>>>> combo id would be different >>>>>>> DELETE 0 >>>>>>> test=# DELETE FROM mvcc_demo; >>>>>>> DELETE 0 >>>>>>> test=# DELETE FROM mvcc_demo; >>>>>>> DELETE 0 >>>>>>> test=# INSERT INTO mvcc_demo VALUES (1); >>>>>>> INSERT 0 1 >>>>>>> test=# INSERT INTO mvcc_demo VALUES (2); >>>>>>> INSERT 0 1 >>>>>>> test=# INSERT INTO mvcc_demo VALUES (3); >>>>>>> INSERT 0 1 >>>>>>> test=# SELECT t_xmin AS xmin, >>>>>>> test-# t_xmax::text::int8 AS xmax, >>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>>>> is_combocid >>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>>>> test-# ORDER BY 2 DESC, 3; >>>>>>> xmin | xmax | cmin_cmax | is_combocid >>>>>>> -------+------+-----------+------------- >>>>>>> 80685 | 0 | 3 | f >>>>>>> 80685 | 0 | 4 | f >>>>>>> 80685 | 0 | 5 | f >>>>>>> (3 rows) >>>>>>> >>>>>>> test=# >>>>>>> test=# DELETE FROM mvcc_demo; >>>>>>> DELETE 3 >>>>>>> test=# SELECT t_xmin AS xmin, >>>>>>> test-# t_xmax::text::int8 AS xmax, >>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>>>> is_combocid >>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>>>> test-# ORDER BY 2 DESC, 3; >>>>>>> xmin | xmax | cmin_cmax | is_combocid >>>>>>> -------+-------+-----------+------------- >>>>>>> 80685 | 80685 | 0 | t >>>>>>> 80685 | 80685 | 1 | t >>>>>>> 80685 | 80685 | 2 | t >>>>>>> (3 rows) >>>>>>> >>>>>>> Note that since is_combocid is true the numbers are not cmin/cmax >>>>>>> they are actually >>>>>>> the indexes of the internal array already explained above. >>>>>>> combo id index 0 would contain cmin 3, cmax 6 >>>>>>> combo id index 1 would contain cmin 4, cmax 6 >>>>>>> combo id index 2 would contain cmin 5, cmax 6 >>>>>>> >>>>>>> test=# >>>>>>> test=# END; >>>>>>> COMMIT >>>>>>> test=# >>>>>>> test=# >>>>>>> test=# TRUNCATE mvcc_demo; >>>>>>> TRUNCATE TABLE >>>>>>> test=# >>>>>>> test=# >>>>>>> test=# >>>>>>> test=# BEGIN; >>>>>>> BEGIN >>>>>>> test=# INSERT INTO mvcc_demo VALUES (1); >>>>>>> INSERT 0 1 >>>>>>> test=# INSERT INTO mvcc_demo VALUES (2); >>>>>>> INSERT 0 1 >>>>>>> test=# INSERT INTO mvcc_demo VALUES (3); >>>>>>> INSERT 0 1 >>>>>>> test=# SELECT t_xmin AS xmin, >>>>>>> test-# t_xmax::text::int8 AS xmax, >>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>>>> is_combocid >>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>>>> test-# ORDER BY 2 DESC, 3; >>>>>>> xmin | xmax | cmin_cmax | is_combocid >>>>>>> -------+------+-----------+------------- >>>>>>> 80675 | 0 | 0 | f >>>>>>> 80675 | 0 | 1 | f >>>>>>> 80675 | 0 | 2 | f >>>>>>> (3 rows) >>>>>>> >>>>>>> test=# >>>>>>> test=# UPDATE mvcc_demo SET val = val * 10; >>>>>>> UPDATE 3 >>>>>>> test=# >>>>>>> test=# SELECT t_xmin AS xmin, >>>>>>> test-# t_xmax::text::int8 AS xmax, >>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>>>> is_combocid >>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>>>> test-# ORDER BY 2 DESC, 3; >>>>>>> xmin | xmax | cmin_cmax | is_combocid >>>>>>> -------+-------+-----------+------------- >>>>>>> 80675 | 80675 | 0 | t >>>>>>> 80675 | 80675 | 1 | t >>>>>>> 80675 | 80675 | 2 | t >>>>>>> 80675 | 0 | 3 | f >>>>>>> 80675 | 0 | 3 | f >>>>>>> 80675 | 0 | 3 | f >>>>>>> (6 rows) >>>>>>> >>>>>>> test=# >>>>>>> test=# END; >>>>>>> COMMIT >>>>>>> test=# >>>>>>> test=# >>>>>>> test=# TRUNCATE mvcc_demo; >>>>>>> TRUNCATE TABLE >>>>>>> test=# >>>>>>> >>>>>>> -- From one psql issue >>>>>>> test=# INSERT INTO mvcc_demo VALUES (1); >>>>>>> INSERT 0 1 >>>>>>> test=# SELECT t_xmin AS xmin, >>>>>>> test-# t_xmax::text::int8 AS xmax, >>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>>>> is_combocid >>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>>>> test-# ORDER BY 2 DESC, 3; >>>>>>> xmin | xmax | cmin_cmax | is_combocid >>>>>>> -------+------+-----------+------------- >>>>>>> 80677 | 0 | 0 | f >>>>>>> (1 row) >>>>>>> >>>>>>> >>>>>>> test=# -- From another issue >>>>>>> test=# BEGIN; >>>>>>> BEGIN >>>>>>> test=# INSERT INTO mvcc_demo VALUES (2); >>>>>>> INSERT 0 1 >>>>>>> test=# INSERT INTO mvcc_demo VALUES (3); >>>>>>> INSERT 0 1 >>>>>>> test=# INSERT INTO mvcc_demo VALUES (4); >>>>>>> INSERT 0 1 >>>>>>> test=# SELECT t_xmin AS xmin, >>>>>>> test-# t_xmax::text::int8 AS xmax, >>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>>>> is_combocid >>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>>>> test-# ORDER BY 2 DESC, 3; >>>>>>> xmin | xmax | cmin_cmax | is_combocid >>>>>>> -------+------+-----------+------------- >>>>>>> 80677 | 0 | 0 | f >>>>>>> 80678 | 0 | 0 | f >>>>>>> 80678 | 0 | 1 | f >>>>>>> 80678 | 0 | 2 | f >>>>>>> (4 rows) >>>>>>> >>>>>>> test=# >>>>>>> test=# UPDATE mvcc_demo SET val = val * 10; >>>>>>> UPDATE 4 >>>>>>> test=# SELECT t_xmin AS xmin, >>>>>>> test-# t_xmax::text::int8 AS xmax, >>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>>>> is_combocid >>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>>>> test-# ORDER BY 2 DESC, 3; >>>>>>> xmin | xmax | cmin_cmax | is_combocid >>>>>>> -------+-------+-----------+------------- >>>>>>> 80678 | 80678 | 0 | t >>>>>>> 80678 | 80678 | 1 | t >>>>>>> 80678 | 80678 | 2 | t >>>>>>> 80677 | 80678 | 3 | f >>>>>>> 80678 | 0 | 3 | f >>>>>>> 80678 | 0 | 3 | f >>>>>>> 80678 | 0 | 3 | f >>>>>>> 80678 | 0 | 3 | f >>>>>>> (8 rows) >>>>>>> >>>>>>> test=# >>>>>>> >>>>>>> test=# -- Before finishing this, issue these from the first psql >>>>>>> test=# SELECT t_xmin AS xmin, >>>>>>> test-# t_xmax::text::int8 AS xmax, >>>>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>>>> is_combocid >>>>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>>>> test-# ORDER BY 2 DESC, 3; >>>>>>> xmin | xmax | cmin_cmax | is_combocid >>>>>>> -------+-------+-----------+------------- >>>>>>> 80678 | 80678 | 0 | t >>>>>>> 80678 | 80678 | 1 | t >>>>>>> 80678 | 80678 | 2 | t >>>>>>> 80677 | 80678 | 3 | f >>>>>>> 80678 | 0 | 3 | f >>>>>>> 80678 | 0 | 3 | f >>>>>>> 80678 | 0 | 3 | f >>>>>>> 80678 | 0 | 3 | f >>>>>>> (8 rows) >>>>>>> >>>>>>> test=# END; >>>>>>> COMMIT >>>>>>> >>>>>>> >>>>>>> Now consider the case we are trying to solve >>>>>>> >>>>>>> drop table tt1; >>>>>>> create table tt1(f1 int); >>>>>>> >>>>>>> BEGIN; >>>>>>> insert into tt1 values(1); >>>>>>> declare c50 cursor for select * from tt1; -- should show one row >>>>>>> only >>>>>>> insert into tt1 values(2); >>>>>>> fetch all from c50; >>>>>>> COMMIT; >>>>>>> >>>>>>> >>>>>>> Consider Data node 1 log >>>>>>> >>>>>>> (a) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read >>>>>>> committed READ WRITE] >>>>>>> (b) [exec_simple_query][1026][drop table tt1;] >>>>>>> (c) [exec_simple_query][1026][PREPARE TRANSACTION 'T21075'] >>>>>>> (d) [exec_simple_query][1026][COMMIT PREPARED 'T21075'] >>>>>>> (e) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read >>>>>>> committed READ WRITE] >>>>>>> (f) [exec_simple_query][1026][create table tt1(f1 int);] >>>>>>> (g) [exec_simple_query][1026][PREPARE TRANSACTION 'T21077'] >>>>>>> (h) [exec_simple_query][1026][COMMIT PREPARED 'T21077'] >>>>>>> (i) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read >>>>>>> committed READ WRITE] >>>>>>> (j) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (1)] >>>>>>> (k) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (2)] >>>>>>> (l) [PostgresMain][4155][SELECT tt1.f1, tt1.ctid, pgxc_node_str() >>>>>>> FROM tt1] >>>>>>> (m) [exec_simple_query][1026][COMMIT TRANSACTION] >>>>>>> >>>>>>> The cursor currently shows both inserted rows because command id at >>>>>>> data node in >>>>>>> step (j) is 0 >>>>>>> step (k) is 1 & >>>>>>> step (l) is 2 >>>>>>> >>>>>>> Where as we need command ids to be >>>>>>> >>>>>>> step (j) should be 0 >>>>>>> step (k) should be 2 & >>>>>>> step (l) should be 1 >>>>>>> >>>>>>> This will solve the cursor visibility problem. >>>>>>> >>>>>>> To implement this I suggest we send command IDs to data nodes from >>>>>>> the coordinator >>>>>>> like we send gxid. The only difference will be that we do not need >>>>>>> to take command IDs >>>>>>> from GTM since they are only valid with in the transaction. >>>>>>> >>>>>>> See this example >>>>>>> >>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>>>> xmin | xmax | cmin | cmax | f1 >>>>>>> ------+------+------+------+---- >>>>>>> (0 rows) >>>>>>> >>>>>>> test=# begin; >>>>>>> BEGIN >>>>>>> test=# insert into tt1 values(1); >>>>>>> INSERT 0 1 >>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>>>> xmin | xmax | cmin | cmax | f1 >>>>>>> -------+------+------+------+---- >>>>>>> 80615 | 0 | 0 | 0 | 1 >>>>>>> (1 row) >>>>>>> >>>>>>> test=# insert into tt1 values(2); >>>>>>> INSERT 0 1 >>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>>>> xmin | xmax | cmin | cmax | f1 >>>>>>> -------+------+------+------+---- >>>>>>> 80615 | 0 | 0 | 0 | 1 >>>>>>> 80615 | 0 | 1 | 1 | 2 >>>>>>> (2 rows) >>>>>>> >>>>>>> test=# insert into tt1 values(3); >>>>>>> INSERT 0 1 >>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>>>> xmin | xmax | cmin | cmax | f1 >>>>>>> -------+------+------+------+---- >>>>>>> 80615 | 0 | 0 | 0 | 1 >>>>>>> 80615 | 0 | 1 | 1 | 2 >>>>>>> 80615 | 0 | 2 | 2 | 3 >>>>>>> (3 rows) >>>>>>> >>>>>>> test=# insert into tt1 values(4); >>>>>>> INSERT 0 1 >>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>>>> xmin | xmax | cmin | cmax | f1 >>>>>>> -------+------+------+------+---- >>>>>>> 80615 | 0 | 0 | 0 | 1 >>>>>>> 80615 | 0 | 1 | 1 | 2 >>>>>>> 80615 | 0 | 2 | 2 | 3 >>>>>>> 80615 | 0 | 3 | 3 | 4 >>>>>>> (4 rows) >>>>>>> >>>>>>> test=# end; >>>>>>> COMMIT >>>>>>> test=# >>>>>>> test=# >>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>>>> xmin | xmax | cmin | cmax | f1 >>>>>>> -------+------+------+------+---- >>>>>>> 80615 | 0 | 0 | 0 | 1 >>>>>>> 80615 | 0 | 1 | 1 | 2 >>>>>>> 80615 | 0 | 2 | 2 | 3 >>>>>>> 80615 | 0 | 3 | 3 | 4 >>>>>>> (4 rows) >>>>>>> >>>>>>> test=# insert into tt1 values(5); >>>>>>> INSERT 0 1 >>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>>>> xmin | xmax | cmin | cmax | f1 >>>>>>> -------+------+------+------+---- >>>>>>> 80615 | 0 | 0 | 0 | 1 >>>>>>> 80615 | 0 | 1 | 1 | 2 >>>>>>> 80615 | 0 | 2 | 2 | 3 >>>>>>> 80615 | 0 | 3 | 3 | 4 >>>>>>> 80616 | 0 | 0 | 0 | 5 >>>>>>> (5 rows) >>>>>>> >>>>>>> test=# insert into tt1 values(6); >>>>>>> INSERT 0 1 >>>>>>> test=# >>>>>>> test=# >>>>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>>>> xmin | xmax | cmin | cmax | f1 >>>>>>> -------+------+------+------+---- >>>>>>> 80615 | 0 | 0 | 0 | 1 >>>>>>> 80615 | 0 | 1 | 1 | 2 >>>>>>> 80615 | 0 | 2 | 2 | 3 >>>>>>> 80615 | 0 | 3 | 3 | 4 >>>>>>> 80616 | 0 | 0 | 0 | 5 >>>>>>> 80617 | 0 | 0 | 0 | 6 >>>>>>> (6 rows) >>>>>>> >>>>>>> Note that at the end of the multi-statement transaction the command >>>>>>> id gets reset to zero. >>>>>>> >>>>>>> -- >>>>>>> Abbas >>>>>>> Architect >>>>>>> EnterpriseDB Corporation >>>>>>> The Enterprise PostgreSQL Company >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> -- >>>>>> Abbas >>>>>> Architect >>>>>> EnterpriseDB Corporation >>>>>> The Enterprise PostgreSQL Company >>>>>> >>>>>> Phone: 92-334-5100153 >>>>>> >>>>>> Website: www.enterprisedb.com >>>>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>>>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>>>>> >>>>>> This e-mail message (and any attachment) is intended for the use of >>>>>> the individual or entity to whom it is addressed. This message >>>>>> contains information from EnterpriseDB Corporation that may be >>>>>> privileged, confidential, or exempt from disclosure under applicable >>>>>> law. If you are not the intended recipient or authorized to receive >>>>>> this for the intended recipient, any use, dissemination, distribution, >>>>>> retention, archiving, or copying of this communication is strictly >>>>>> prohibited. If you have received this e-mail in error, please notify >>>>>> the sender immediately by reply e-mail and delete this message. >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------------ >>>>>> Live Security Virtual Conference >>>>>> Exclusive live event will cover all the ways today's security and >>>>>> threat landscape has changed and how IT managers can respond. >>>>>> Discussions >>>>>> will include endpoint security, mobile security and the latest in >>>>>> malware >>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>>> _______________________________________________ >>>>>> Postgres-xc-developers mailing list >>>>>> Pos...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Michael Paquier >>>>> http://michael.otacoo.com >>>>> >>>> >>>> >>>> >>>> -- >>>> -- >>>> Abbas >>>> Architect >>>> EnterpriseDB Corporation >>>> The Enterprise PostgreSQL Company >>>> >>>> Phone: 92-334-5100153 >>>> >>>> Website: www.enterprisedb.com >>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>>> >>>> This e-mail message (and any attachment) is intended for the use of >>>> the individual or entity to whom it is addressed. This message >>>> contains information from EnterpriseDB Corporation that may be >>>> privileged, confidential, or exempt from disclosure under applicable >>>> law. If you are not the intended recipient or authorized to receive >>>> this for the intended recipient, any use, dissemination, distribution, >>>> retention, archiving, or copying of this communication is strictly >>>> prohibited. If you have received this e-mail in error, please notify >>>> the sender immediately by reply e-mail and delete this message. >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Live Security Virtual Conference >>>> Exclusive live event will cover all the ways today's security and >>>> threat landscape has changed and how IT managers can respond. >>>> Discussions >>>> will include endpoint security, mobile security and the latest in >>>> malware >>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>> _______________________________________________ >>>> Postgres-xc-developers mailing list >>>> Pos...@li... >>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>> >>>> >>> >>> >>> -- >>> Best Wishes, >>> Ashutosh Bapat >>> EntepriseDB Corporation >>> The Enterprise Postgres Company >>> >>> >> >> >> -- >> -- >> Abbas >> Architect >> EnterpriseDB Corporation >> The Enterprise PostgreSQL Company >> >> Phone: 92-334-5100153 >> >> Website: www.enterprisedb.com >> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> Follow us on Twitter: http://www.twitter.com/enterprisedb >> >> This e-mail message (and any attachment) is intended for the use of >> the individual or entity to whom it is addressed. This message >> contains information from EnterpriseDB Corporation that may be >> privileged, confidential, or exempt from disclosure under applicable >> law. If you are not the intended recipient or authorized to receive >> this for the intended recipient, any use, dissemination, distribution, >> retention, archiving, or copying of this communication is strictly >> prohibited. If you have received this e-mail in error, please notify >> the sender immediately by reply e-mail and delete this message. >> > > > > -- > -- > Abbas > Architect > EnterpriseDB Corporation > The Enterprise PostgreSQL Company > > Phone: 92-334-5100153 > > Website: www.enterprisedb.com > EnterpriseDB Blog: http://blogs.enterprisedb.com/ > Follow us on Twitter: http://www.twitter.com/enterprisedb > > This e-mail message (and any attachment) is intended for the use of > the individual or entity to whom it is addressed. This message > contains information from EnterpriseDB Corporation that may be > privileged, confidential, or exempt from disclosure under applicable > law. If you are not the intended recipient or authorized to receive > this for the intended recipient, any use, dissemination, distribution, > retention, archiving, or copying of this communication is strictly > prohibited. If you have received this e-mail in error, please notify > the sender immediately by reply e-mail and delete this message. > -- -- Abbas Architect EnterpriseDB Corporation The Enterprise PostgreSQL Company Phone: 92-334-5100153 Website: www.enterprisedb.com EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Shankar H. <har...@ya...> - 2012-07-09 16:12:04
|
Thanks Ashutosh. You are right, while running this test i just had pgbench running against one coordinator. Looks like pgbench by itself may not be an apt tool for this kind of testing, I will instead run pgbench's underlying sql script from cmdline against either coordinators. Thanks for that tip. I got a lot of input on my problem from a lot of folks on the list, the feedback is much appreciated. Thanks everybody! On max_prepared_transactions, I will factor in the number of coordinators and the max_connections on each coordinator while arriving at a figure. Will also try out Koichi Suzuki's suggestion to have multiple NICs on the GTM. I will post my findings here for the same cluster configuration as before. thanks, Shankar ________________________________ From: Ashutosh Bapat <ash...@en...> To: Shankar Hariharan <har...@ya...> Cc: "pos...@li..." <pos...@li...> Sent: Sunday, July 8, 2012 11:02 PM Subject: Re: [Postgres-xc-developers] Question on gtm-proxy Hi Shankar, You have got answers to the prepared transaction problem, I guess. I have something else below. On Sat, Jul 7, 2012 at 1:44 AM, Shankar Hariharan <har...@ya...> wrote: As planned I ran some tests using PGBench on this setup : > > >Node 1 - Coord1, Datanode1, gtm-proxy1 >Node 2- Coord2, Datanode2, gtm-proxy2 >Node 3- Datanode3, gtm > >I was connecting via Coord1 for these tests: >- scale factor of 30 used >- tests run using the following input parameters for pgbench: Try connecting to both the coordinators, it should give you better performance, esp, when you are using distributed tables. With distributed tables, coordinator gets involved in query execution more than that in the case of replicated tables. So, balancing load across two coordinators would help. > >ClientsThreadsDurationTransactions >111006204 >221009960 >4410012880 >661001676 8 >8810019758 >101010021944 >121210020674 > > >The run went well until the 8 clients. I started seeing errors on 10 clients onwards and eventually the 14 client run has been hanging around for over an hour now. The errors I have been seeing on console are the following : > > >pgbench console : >Client 8 aborted in state 12: ERROR: GTM error, could not obtain snapshot > >Client 0 aborted in state 13: ERROR: maximum number of prepared transactions reached >Client 7 aborted in state 13: ERROR: maximum number of prepared transactions reached >Client 11 aborted in state 13: ERROR: maximum number of prepared transactions reached >Client 9 aborted in state 13: ERROR: maximum number of prepared transactions reached > > >node console: >ERROR: GTM error, could not obtain snapshot >STATEMENT: INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (253, 26, 1888413, -817, CURRENT_TIMESTAMP); >ERROR: maximum number of prepared transactions reached >HINT: Increase max_prepared_transactions (currently 10). >STATEMENT: PREPARE TRANSACTION 'T201428' >ERROR: maximum number of prepared transactions reached >STATEMENT: END; >ERROR: maximum number of prepared transactions reached >STATEMENT: END; >ERROR: maximum number of prepared transactions reached >STATEMENT: END; >ERROR: maximum number of prepared transactions reached >STATEMENT: END; >ERROR: GTM error, could not obtain snapshot >STATEMENT: INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (140, 29, 2416403, -4192, CURRENT_TIMESTAMP); > > >I was also watching the processes on each node and see the following for the 14 client run: > > > > >Node1 : >postgres 25571 10511 0 04:41 ? 00:00:02 postgres: postgres postgres ::1(33481) TRUNCATE TABLE waiting >postgres 25620 11694 0 04:46 ? 00:00:00 postgres: postgres postgres pgbench-address (50388) TRUNCATE TABLE > > >Node2: >postgres 10979 9631 0 Jul05 ? 00:00:42 postgres: postgres postgres coord1-address(57357) idle in transaction > > > >Node3: > >postgres 20264 9911 0 08:35 ? 00:00:05 postgres: postgres postgres coord1-address(51406) TRUNCATE TABLE waiting > > > > > >I was going to restart the processes on all nodes and start over but did not want to lose this data as it could be useful information. > > >Any explanation on the above issue is much appreciated. I will try the next run with a higher value set for max_prepared_transactions. Any recommendations for a good value on this front? > > > >thanks, >Shankar > > > > > >________________________________ > From: Shankar Hariharan <har...@ya...> >To: Ashutosh Bapat <ash...@en...> >Cc: "pos...@li..." <pos...@li...> >Sent: Friday, July 6, 2012 8:22 AM > >Subject: Re: [Postgres-xc-developers] Question on gtm-proxy > > > >Hi Ashutosh, >I was trying to size the load on a server and was wondering if a GTM could be shared w/o much performance overhead between a small number of datanodes and coordinators. I will post my findings here. >thanks, >Shankar > > > >________________________________ > From: Ashutosh Bapat <ash...@en...> >To: Shankar Hariharan <har...@ya...> >Cc: "pos...@li..." <pos...@li...> >Sent: Friday, July 6, 2012 12:25 AM >Subject: Re: [Postgres-xc-developers] Question on gtm-proxy > > >Hi Shankar, >Running gtm-proxy has shown to improve the performance, because it lessens the load on GTM, by serving requests locally. Why do you want the coordinators to connect directly to the GTM? Are you seeing any performance improvement from doing that? > > >On Fri, Jul 6, 2012 at 10:08 AM, Shankar Hariharan <har...@ya...> wrote: > >Follow up to earlier email. In the setup described below, can I avoid using a gtm-proxy? That is, can I just simply point coordinators to the one gtm running on node 3 ? >>My initial plan was to just run the gtm on node 3 then I thought I could try a datanode without a local coordinator which was why I put these two together on node 3. >>thanks, >>Shankar >> >> >> >>________________________________ >> From: Shankar Hariharan <har...@ya...> >>To: "pos...@li..." <pos...@li...> >>Sent: Thursday, July 5, 2012 11:35 PM >>Subject: Question on multiple coordinators >> >> >>Hello, >> >> >>Am trying out XC 1.0 in the following configuraiton. >>Node 1 - Coord1, Datanode1, gtm-proxy1 >>Node 2- Coord2, Datanode2, gtm-proxy2 >>Node 3- Datanode3, gtm >> >> >>I setup all nodes but forgot to add Coord1 to Coord2 and vice versa. In addition I missed the pg_hba edit as well. So the first table T1 that I created for distribution from Coord1 was not "visible| from Coord2 but was on all the data nodes. >>I tried to get Coord2 backinto business in various ways but the first table I created refused to show up on Coord2 : >>- edit pg_hba and add node on both coord1 and 2. Then run select pgxc_pool_reload(); >>- restart coord 1 and 2 >>- drop node c2 from c1 and c1 from c2 and add them back followed by select pgxc_pool_reload(); >> >> >>So I tried to create the same table T1 from Coord2 to observe behavior and it did not like it clearly as all nodes it "wrote" to reported that the table already existed which was good. At this point I could understand that Coord2 and Coord1 are not talking alright so I created a new table from coord1 with replication. This table was visible from both now. >> >> >>Question is should I expect to see the first table, let me call it T1 after a while from Coord2 also? >> >> >> >> >>thanks, >>Shankar >> >> >>------------------------------------------------------------------------------ >>Live Security Virtual Conference >>Exclusive live event will cover all the ways today's security and >>threat landscape has changed and how IT managers can respond. Discussions >>will include endpoint security, mobile security and the latest in malware >>threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>_______________________________________________ >>Postgres-xc-developers mailing list >>Pos...@li... >>https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >> > > >-- >Best Wishes, >Ashutosh Bapat >EntepriseDB Corporation >The Enterprise Postgres Company > > > > > > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: pramodh m. <pos...@gm...> - 2012-07-09 13:32:36
|
1) Are triggers fully supported in XC ? 2) Can I setup slony on datanodes to replicate certain tables to different postgres cluster? Pramodh Mereddy |
From: Ashutosh B. <ash...@en...> - 2012-07-09 12:46:08
|
On Mon, Jul 9, 2012 at 5:37 PM, Michael Paquier <mic...@gm...>wrote: > > > On Mon, Jul 9, 2012 at 7:56 PM, Ashutosh Bapat < > ash...@en...> wrote: > >> Hi Michael, >> I had a look at the patch. I mainly focused on the overall content of the >> patch and importantly tests. Before I look at the redistribution code >> thoroughly, I have few comments. >> >> There are many trailing white spaces in the patch. Please fix those, they >> unnecessarily fail the automatic merges sometimes. You can do that when you >> commit the patch. >> > Oh OK, I didn't notice. Do you have some places particularly in mind? > Apply your patch on clean repository using git apply and it will show you. > > >> >> Code >> ==== >> 1. There is a lot of code, which is refactoring existing code, renaming >> functions, which is not necessarily related to redistribution work. Can you >> please provide separate patches for this refactoring? We should commit them >> separately. For example build_subcluster_data() has been renamed (for good >> may be), but it makes sense if we do it separately. Someone looking at the >> ALTER TABLE commit should not get overwhelmed by the extraneous changes. >> > OK. The problem with the functions currently on master was that there name > was not really generic and sometimes did not reflect their real > functionality. So as now the plan is to use them in a more general way, I > think there name is not going to change anymore. > > >> >> 2. Same is the case with the grammar changes. Please separate the grammar >> changes related to pgxc_nodelist etc. into separate patch, although it's >> because of ALTER TABLE you need to do those changes. >> > OK understood. > > >> >> Please get these patches reviewed as well, since I haven't looked at the >> changes proper. >> > Understood, I'll make those 2 patches on tomorrow morning, not a big deal. > > >> >> Tests >> ===== >> 1. There is no need to test with huge data, that slows down regression. >> For performance testing, you can create a separate test (not to be included >> in regression), if you want. >> > That may be an idea. However you are right I'll limit the number of rows > tested. > > > >> 2. We need tests, which will test the plan cache (in)validation upon >> redistribution of data, tests for testing existing views working after the >> redistribution. Please take a look at the PG alter table test for more such >> scenarios. > > OK I'll add those scenarios. They will be included in xc_alter_table. > > >> If you happen to add some performance tests, it would be also good to >> test the sanity of concurrent transactions accessing the object/s being >> redistributed. It's vital considering that such redistribution would run >> for longer. >> > Yes, it would be nice to > > > >> 3. Instead of relying on count(*) to show sanity of the redistributed >> data, you may use better aggregates like array_agg or sum(), avg() and >> count(). I would prefer array_agg over others, since you can list all the >> data values there. You will need aggregate's order by clause (Not that of >> the SELECT). >> 4. In the case of redistribution of table with index, you will need to >> check the sanity of index after the redistribution by some means. >> > Do you have an idea of how to do that? Pick up some tests from postgres? > Good question. But I don't have an answer (specifically for XC, since the indexes are on datanodes). > > >> 5. I did not understand the significance of the tests where you add and >> drop column and redistribute the data. The SELECT after the redistribution >> is not testing anything specific for the added/dropped column. >> > The internal, let's say default layer, of distribution mechanism uses an > internal COPY and it is important to do this check and correctly bypass the > columns that are dropped. The SELECT is just here to check that data has > been redistributed correctly. > > > >> 6. There are no testcases which would change the distribution type and >> node list at the same time. Please add those. (I am assuming that these two >> operations are possible together). >> > Yeah sorry, I have been working on that today and added some additional > tests that can do that. > They are in the bucket, just I didn't send the absolutely latest version. > > >> 7. Negative testcases need to improved. >> > What are the negative test cases? It would be cool if you could precise. > Tests which do negative testing ( http://www.sqatester.com/methodology/PositiveandNegativeTesting.htm) > > >> >> Additional feature >> ================== >> It will be helpful to add the distribution information in the output of >> \d command for tables. It will be good tool for tests to check whether the >> catalogs have been updated correctly or not. Please add this feature before >> we complete ALTER TABLE. It shouldn't take much time. Please provide this >> as a separate patch. >> > +1. > This is a good idea, and I recall we had this discussion a couple of > months ago. However it is not directly related with redistribution. So it > should be provided after committing the redistribution work I believe. > It will help in testing the feature. For example, you can just do \d on the redistributed table, to see if catalogs have been updated correctly or not. So, it's better to do it before this ALTER TABLE, so that you can use it in the tests. It should been done when the work related to the subcluster was done, even before when XC was started :). Anyway, earlier the better. > Also, I think we shouldn't use ¥d as it will impact other applications > like pgadmin for instance. We should use an extension of ¥d like for > example ¥dZ. This is just a suggestion, I don't know what are the commands > still not in use. > \d is for describing a relation at bare minimum. In XC distribution strategy becomes an integral part of a relation, and thus should be part of the \d output. Applications using \d will need a change, but how many applications connect via psql to fire commands (very less, I guess), so we are not in much trouble. If one compares changing grammar of say CREATE TABLE after the first release, would be more problematic that this one. > -- > Michael Paquier > http://michael.otacoo.com > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Michael P. <mic...@gm...> - 2012-07-09 12:27:04
|
On Mon, Jul 9, 2012 at 9:22 PM, Ashutosh Bapat < ash...@en...> wrote: > > > On Mon, Jul 9, 2012 at 5:37 PM, Michael Paquier <mic...@gm... > > wrote: > >> >> >> On Mon, Jul 9, 2012 at 7:56 PM, Ashutosh Bapat < >> ash...@en...> wrote: >> >>> Hi Michael, >>> I had a look at the patch. I mainly focused on the overall content of >>> the patch and importantly tests. Before I look at the redistribution code >>> thoroughly, I have few comments. >>> >>> There are many trailing white spaces in the patch. Please fix those, >>> they unnecessarily fail the automatic merges sometimes. You can do that >>> when you commit the patch. >>> >> Oh OK, I didn't notice. Do you have some places particularly in mind? >> > > Apply your patch on clean repository using git apply and it will show you. > I'll make a test. > > >> >> >>> >>> Code >>> ==== >>> 1. There is a lot of code, which is refactoring existing code, renaming >>> functions, which is not necessarily related to redistribution work. Can you >>> please provide separate patches for this refactoring? We should commit them >>> separately. For example build_subcluster_data() has been renamed (for good >>> may be), but it makes sense if we do it separately. Someone looking at the >>> ALTER TABLE commit should not get overwhelmed by the extraneous changes. >>> >> OK. The problem with the functions currently on master was that there >> name was not really generic and sometimes did not reflect their real >> functionality. So as now the plan is to use them in a more general way, I >> think there name is not going to change anymore. >> >> >>> >>> 2. Same is the case with the grammar changes. Please separate the >>> grammar changes related to pgxc_nodelist etc. into separate patch, although >>> it's because of ALTER TABLE you need to do those changes. >>> >> OK understood. >> >> >>> >>> Please get these patches reviewed as well, since I haven't looked at the >>> changes proper. >>> >> Understood, I'll make those 2 patches on tomorrow morning, not a big deal. >> >> >>> >>> Tests >>> ===== >>> 1. There is no need to test with huge data, that slows down regression. >>> For performance testing, you can create a separate test (not to be included >>> in regression), if you want. >>> >> That may be an idea. However you are right I'll limit the number of rows >> tested. >> >> >> >>> 2. We need tests, which will test the plan cache (in)validation upon >>> redistribution of data, tests for testing existing views working after the >>> redistribution. Please take a look at the PG alter table test for more such >>> scenarios. >> >> OK I'll add those scenarios. They will be included in xc_alter_table. >> >> >>> If you happen to add some performance tests, it would be also good to >>> test the sanity of concurrent transactions accessing the object/s being >>> redistributed. It's vital considering that such redistribution would run >>> for longer. >>> >> Yes, it would be nice to >> >> >> >>> 3. Instead of relying on count(*) to show sanity of the redistributed >>> data, you may use better aggregates like array_agg or sum(), avg() and >>> count(). I would prefer array_agg over others, since you can list all the >>> data values there. You will need aggregate's order by clause (Not that of >>> the SELECT). >>> 4. In the case of redistribution of table with index, you will need to >>> check the sanity of index after the redistribution by some means. >>> >> Do you have an idea of how to do that? Pick up some tests from postgres? >> > > Good question. But I don't have an answer (specifically for XC, since the > indexes are on datanodes). > OK I'll figure out myself smth. > > >> >> >>> 5. I did not understand the significance of the tests where you add and >>> drop column and redistribute the data. The SELECT after the redistribution >>> is not testing anything specific for the added/dropped column. >>> >> The internal, let's say default layer, of distribution mechanism uses an >> internal COPY and it is important to do this check and correctly bypass the >> columns that are dropped. The SELECT is just here to check that data has >> been redistributed correctly. >> >> >> >>> 6. There are no testcases which would change the distribution type and >>> node list at the same time. Please add those. (I am assuming that these two >>> operations are possible together). >>> >> Yeah sorry, I have been working on that today and added some additional >> tests that can do that. >> They are in the bucket, just I didn't send the absolutely latest version. >> >> >>> 7. Negative testcases need to improved. >>> >> What are the negative test cases? It would be cool if you could precise. >> > > Tests which do negative testing ( > http://www.sqatester.com/methodology/PositiveandNegativeTesting.htm) > So you mean that I need to reformat and refactor my test cases??! > > >> >> >>> >>> Additional feature >>> ================== >>> It will be helpful to add the distribution information in the output of >>> \d command for tables. It will be good tool for tests to check whether the >>> catalogs have been updated correctly or not. Please add this feature before >>> we complete ALTER TABLE. It shouldn't take much time. Please provide this >>> as a separate patch. >>> >> +1. >> This is a good idea, and I recall we had this discussion a couple of >> months ago. However it is not directly related with redistribution. So it >> should be provided after committing the redistribution work I believe. >> > > It will help in testing the feature. For example, you can just do \d on > the redistributed table, to see if catalogs have been updated correctly or > not. So, it's better to do it before this ALTER TABLE, so that you can use > it in the tests. It should been done when the work related to the > subcluster was done, even before when XC was started :). Anyway, earlier > the better. > > >> Also, I think we shouldn't use ¥d as it will impact other applications >> like pgadmin for instance. We should use an extension of ¥d like for >> example ¥dZ. This is just a suggestion, I don't know what are the commands >> still not in use. >> > > \d is for describing a relation at bare minimum. In XC distribution > strategy becomes an integral part of a relation, and thus should be part of > the \d output. Applications using \d will need a change, but how many > applications connect via psql to fire commands (very less, I guess), so we > are not in much trouble. > We are not sure about that, honestly! Just for security's sake, we should use another command. It also impacts XC transparency with postgres. I also believe it is an add-on, which is not part of the redistribution core. -- Michael Paquier http://michael.otacoo.com |
From: Michael P. <mic...@gm...> - 2012-07-09 12:07:48
|
On Mon, Jul 9, 2012 at 7:56 PM, Ashutosh Bapat < ash...@en...> wrote: > Hi Michael, > I had a look at the patch. I mainly focused on the overall content of the > patch and importantly tests. Before I look at the redistribution code > thoroughly, I have few comments. > > There are many trailing white spaces in the patch. Please fix those, they > unnecessarily fail the automatic merges sometimes. You can do that when you > commit the patch. > Oh OK, I didn't notice. Do you have some places particularly in mind? > > Code > ==== > 1. There is a lot of code, which is refactoring existing code, renaming > functions, which is not necessarily related to redistribution work. Can you > please provide separate patches for this refactoring? We should commit them > separately. For example build_subcluster_data() has been renamed (for good > may be), but it makes sense if we do it separately. Someone looking at the > ALTER TABLE commit should not get overwhelmed by the extraneous changes. > OK. The problem with the functions currently on master was that there name was not really generic and sometimes did not reflect their real functionality. So as now the plan is to use them in a more general way, I think there name is not going to change anymore. > > 2. Same is the case with the grammar changes. Please separate the grammar > changes related to pgxc_nodelist etc. into separate patch, although it's > because of ALTER TABLE you need to do those changes. > OK understood. > > Please get these patches reviewed as well, since I haven't looked at the > changes proper. > Understood, I'll make those 2 patches on tomorrow morning, not a big deal. > > Tests > ===== > 1. There is no need to test with huge data, that slows down regression. > For performance testing, you can create a separate test (not to be included > in regression), if you want. > That may be an idea. However you are right I'll limit the number of rows tested. > 2. We need tests, which will test the plan cache (in)validation upon > redistribution of data, tests for testing existing views working after the > redistribution. Please take a look at the PG alter table test for more such > scenarios. OK I'll add those scenarios. They will be included in xc_alter_table. > If you happen to add some performance tests, it would be also good to test > the sanity of concurrent transactions accessing the object/s being > redistributed. It's vital considering that such redistribution would run > for longer. > Yes, it would be nice to > 3. Instead of relying on count(*) to show sanity of the redistributed > data, you may use better aggregates like array_agg or sum(), avg() and > count(). I would prefer array_agg over others, since you can list all the > data values there. You will need aggregate's order by clause (Not that of > the SELECT). > 4. In the case of redistribution of table with index, you will need to > check the sanity of index after the redistribution by some means. > Do you have an idea of how to do that? Pick up some tests from postgres? > 5. I did not understand the significance of the tests where you add and > drop column and redistribute the data. The SELECT after the redistribution > is not testing anything specific for the added/dropped column. > The internal, let's say default layer, of distribution mechanism uses an internal COPY and it is important to do this check and correctly bypass the columns that are dropped. The SELECT is just here to check that data has been redistributed correctly. > 6. There are no testcases which would change the distribution type and > node list at the same time. Please add those. (I am assuming that these two > operations are possible together). > Yeah sorry, I have been working on that today and added some additional tests that can do that. They are in the bucket, just I didn't send the absolutely latest version. > 7. Negative testcases need to improved. > What are the negative test cases? It would be cool if you could precise. > > Additional feature > ================== > It will be helpful to add the distribution information in the output of \d > command for tables. It will be good tool for tests to check whether the > catalogs have been updated correctly or not. Please add this feature before > we complete ALTER TABLE. It shouldn't take much time. Please provide this > as a separate patch. > +1. This is a good idea, and I recall we had this discussion a couple of months ago. However it is not directly related with redistribution. So it should be provided after committing the redistribution work I believe. Also, I think we shouldn't use ¥d as it will impact other applications like pgadmin for instance. We should use an extension of ¥d like for example ¥dZ. This is just a suggestion, I don't know what are the commands still not in use. -- Michael Paquier http://michael.otacoo.com |
From: Ashutosh B. <ash...@en...> - 2012-07-09 10:56:28
|
Hi Michael, I had a look at the patch. I mainly focused on the overall content of the patch and importantly tests. Before I look at the redistribution code thoroughly, I have few comments. There are many trailing white spaces in the patch. Please fix those, they unnecessarily fail the automatic merges sometimes. You can do that when you commit the patch. Code ==== 1. There is a lot of code, which is refactoring existing code, renaming functions, which is not necessarily related to redistribution work. Can you please provide separate patches for this refactoring? We should commit them separately. For example build_subcluster_data() has been renamed (for good may be), but it makes sense if we do it separately. Someone looking at the ALTER TABLE commit should not get overwhelmed by the extraneous changes. 2. Same is the case with the grammar changes. Please separate the grammar changes related to pgxc_nodelist etc. into separate patch, although it's because of ALTER TABLE you need to do those changes. Please get these patches reviewed as well, since I haven't looked at the changes proper. Tests ===== 1. There is no need to test with huge data, that slows down regression. For performance testing, you can create a separate test (not to be included in regression), if you want. 2. We need tests, which will test the plan cache (in)validation upon redistribution of data, tests for testing existing views working after the redistribution. Please take a look at the PG alter table test for more such scenarios. If you happen to add some performance tests, it would be also good to test the sanity of concurrent transactions accessing the object/s being redistributed. It's vital considering that such redistribution would run for longer. 3. Instead of relying on count(*) to show sanity of the redistributed data, you may use better aggregates like array_agg or sum(), avg() and count(). I would prefer array_agg over others, since you can list all the data values there. You will need aggregate's order by clause (Not that of the SELECT). 4. In the case of redistribution of table with index, you will need to check the sanity of index after the redistribution by some means. 5. I did not understand the significance of the tests where you add and drop column and redistribute the data. The SELECT after the redistribution is not testing anything specific for the added/dropped column. 6. There are no testcases which would change the distribution type and node list at the same time. Please add those. (I am assuming that these two operations are possible together). 7. Negative testcases need to improved. Additional feature ================== It will be helpful to add the distribution information in the output of \d command for tables. It will be good tool for tests to check whether the catalogs have been updated correctly or not. Please add this feature before we complete ALTER TABLE. It shouldn't take much time. Please provide this as a separate patch. On Mon, Jul 9, 2012 at 6:51 AM, Michael Paquier <mic...@gm...>wrote: > Please find attached an updated version of the patches for redistribution. > The only modification is the addition of deeper regression tests to check > redistribution of a table by adding and deleting nodes on it. > This is done with a plpgsql function called alter_table_change_nodes I > created myself for this purpose, making transparent ALTER TABLE > TO/ADD/DELETE NODE whatever the cluster configuration used with regressions. > > Regards, > > > On Wed, Jul 4, 2012 at 11:02 AM, Michael Paquier < > mic...@gm...> wrote: > >> Please find attached a new version of the 2nd patch. >> This version corrects some bugs related to table columns being dropped >> and added. >> It also contains new regression cases to cover those problems. >> >> Thanks. >> >> >> On Tue, Jul 3, 2012 at 12:15 PM, Michael Paquier < >> mic...@gm...> wrote: >> >>> Hi all, >>> >>> Please find attached 2 patches: 20120703_remotecopy.patch and >>> 20120703_altertable_distrib.patch. >>> 20120703_remotecopy.patch is a lightly modified version of a patch that >>> has already reviewed by Amit where the COPY protocol used by XC code in >>> copy.c is extracted into an external file. >>> This cleans copy.c with a lot of code and simplifies the comprehension >>> of the protocol used. The only part modified is in >>> RemoteCopy_GetRelationLoc where we scan all the attributes of a relation to >>> find the distribution column of a table in case the list of attribute >>> numbers is not specified. This patch has already been reviewed and can be >>> already committed I think. >>> >>> Now the real part, online data redistribution is managed by the second >>> patch: 20120703_altertable_distrib.patch. >>> I am not coming back to the design of the feature that has been chosen. >>> The main modification that is introduced by this patch is the use of a >>> tuple store to store the tuples that need to be redistributed in the >>> cluster. This patch also contains new features that allow to materialize in >>> a tuple slot raw data received by COPY protocol on Coordinator to be able >>> to redirect to correct node a tuple if the new distribution type is hash or >>> modulo. This new mechanism can also be used not only for data >>> redistribution but also to facilitate the exchange of data between nodes >>> (direct consequences on triggers and global constraints). >>> The reverse transformation (from tuple slot to raw data) is also >>> included in this patch. >>> >>> This patch is something like 2000 lines, and does not yet contain the >>> following features which will be added by other patches once this is >>> committed: >>> - No need to materialize in tuple slot if new distribution is replication >>> - No optimization when a replicated table subcluster is reduced (need >>> only to send TRUNCATE to correct nodes) >>> - No optimization when a replicated table subcluster is increased (need >>> only to send tuple to new nodes after fetching it from old nodes) >>> However, I wrote the patch in a way such as those optimizations are easy >>> to implement in the current infrastructure. >>> >>> Please note that this patch contains all the documentation and >>> regression tests. >>> So, any guy has the courage to provide comments to it? >>> -- >>> Michael Paquier >>> http://michael.otacoo.com >>> >> >> >> >> -- >> Michael Paquier >> http://michael.otacoo.com >> > > > > -- > Michael Paquier > http://michael.otacoo.com > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Koichi S. <koi...@gm...> - 2012-07-09 04:58:00
|
In the case of DBT-1 benchmark, we had up to 12,000 TPS, with ten coordinators and ten datanodes. In this situation, GTM read/wrote about 60MB/sec, which is about 50% of the network max capacity. (sorry again, I'm not quire sure about the exact figure now, but this is very good estimation). Proxies reduced this amount to about a half. As you suggested, yes, there could be a chance to extend GTM scalability a bit more if GTM is equipped with more than one NIC and each NIC is connected to different servers via L2 switch. This will reduce the communication overhead for each communication channel. I've not tried this yet but this sounds interesting to me. Regards; ---------- Koichi Suzuki 2012/7/9 Nikhil Sontakke <ni...@st...>: > Yeah, thanks for bringing out the networking (more important) aspect > here. So it might help to have the GTM traffic happening on a > different private network (separate from the client traffic) for > performance. > >> Of course, GTM proxy reduces the number of GTM threads, which reduces >> the chance of lock conflicts but I've not evaluated this yet. Also, >> I've not evaluated how much GTM cpu is saved by GTM proxy yet. >> > > I did some runs on a 4 node setup of mine and having proxies does seem > to help in reducing the CPU overhead on the GTM node. I had not maxed > out the GTM or anything though, but having proxies around seemed to > help. > > Regards, > Nikhils > >> Regards; >> ---------- >> Koichi Suzuki >> >> >> 2012/7/7 Nikhil Sontakke <ni...@st...>: >>> Hi Shankar, >>> >>> Yeah, the GTM might be able to scale a bit to some level, but after >>> that having the proxies around on each node makes much more sense. It >>> also helps reduce the direct CPU load on the GTM node. And the proxies >>> shouldn't consume that much CPU by themselves too. Unless you are >>> trying a CPU intensive benchmark, but most benchmarks try to churn up >>> IO.. >>> >>> Regards, >>> Nikhils >>> >>> On Fri, Jul 6, 2012 at 9:22 AM, Shankar Hariharan >>> <har...@ya...> wrote: >>>> Hi Ashutosh, >>>> I was trying to size the load on a server and was wondering if a GTM could >>>> be shared w/o much performance overhead between a small number of datanodes >>>> and coordinators. I will post my findings here. >>>> thanks, >>>> Shankar >>>> >>>> ________________________________ >>>> From: Ashutosh Bapat <ash...@en...> >>>> To: Shankar Hariharan <har...@ya...> >>>> Cc: "pos...@li..." >>>> <pos...@li...> >>>> Sent: Friday, July 6, 2012 12:25 AM >>>> Subject: Re: [Postgres-xc-developers] Question on gtm-proxy >>>> >>>> Hi Shankar, >>>> Running gtm-proxy has shown to improve the performance, because it lessens >>>> the load on GTM, by serving requests locally. Why do you want the >>>> coordinators to connect directly to the GTM? Are you seeing any performance >>>> improvement from doing that? >>>> >>>> On Fri, Jul 6, 2012 at 10:08 AM, Shankar Hariharan >>>> <har...@ya...> wrote: >>>> >>>> Follow up to earlier email. In the setup described below, can I avoid using >>>> a gtm-proxy? That is, can I just simply point coordinators to the one gtm >>>> running on node 3 ? >>>> My initial plan was to just run the gtm on node 3 then I thought I could try >>>> a datanode without a local coordinator which was why I put these two >>>> together on node 3. >>>> thanks, >>>> Shankar >>>> >>>> ________________________________ >>>> From: Shankar Hariharan <har...@ya...> >>>> To: "pos...@li..." >>>> <pos...@li...> >>>> Sent: Thursday, July 5, 2012 11:35 PM >>>> Subject: Question on multiple coordinators >>>> >>>> Hello, >>>> >>>> Am trying out XC 1.0 in the following configuraiton. >>>> Node 1 - Coord1, Datanode1, gtm-proxy1 >>>> Node 2- Coord2, Datanode2, gtm-proxy2 >>>> Node 3- Datanode3, gtm >>>> >>>> I setup all nodes but forgot to add Coord1 to Coord2 and vice versa. In >>>> addition I missed the pg_hba edit as well. So the first table T1 that I >>>> created for distribution from Coord1 was not "visible| from Coord2 but was >>>> on all the data nodes. >>>> I tried to get Coord2 backinto business in various ways but the first table >>>> I created refused to show up on Coord2 : >>>> - edit pg_hba and add node on both coord1 and 2. Then run select >>>> pgxc_pool_reload(); >>>> - restart coord 1 and 2 >>>> - drop node c2 from c1 and c1 from c2 and add them back followed by select >>>> pgxc_pool_reload(); >>>> >>>> So I tried to create the same table T1 from Coord2 to observe behavior and >>>> it did not like it clearly as all nodes it "wrote" to reported that the >>>> table already existed which was good. At this point I could understand that >>>> Coord2 and Coord1 are not talking alright so I created a new table from >>>> coord1 with replication. This table was visible from both now. >>>> >>>> Question is should I expect to see the first table, let me call it T1 after >>>> a while from Coord2 also? >>>> >>>> >>>> thanks, >>>> Shankar >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Live Security Virtual Conference >>>> Exclusive live event will cover all the ways today's security and >>>> threat landscape has changed and how IT managers can respond. Discussions >>>> will include endpoint security, mobile security and the latest in malware >>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>> _______________________________________________ >>>> Postgres-xc-developers mailing list >>>> Pos...@li... >>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>> >>>> >>>> >>>> >>>> -- >>>> Best Wishes, >>>> Ashutosh Bapat >>>> EntepriseDB Corporation >>>> The Enterprise Postgres Company >>>> >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Live Security Virtual Conference >>>> Exclusive live event will cover all the ways today's security and >>>> threat landscape has changed and how IT managers can respond. Discussions >>>> will include endpoint security, mobile security and the latest in malware >>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>> _______________________________________________ >>>> Postgres-xc-developers mailing list >>>> Pos...@li... >>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>> >>> >>> >>> >>> -- >>> StormDB - http://www.stormdb.com >>> The Database Cloud >>> >>> ------------------------------------------------------------------------------ >>> Live Security Virtual Conference >>> Exclusive live event will cover all the ways today's security and >>> threat landscape has changed and how IT managers can respond. Discussions >>> will include endpoint security, mobile security and the latest in malware >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> _______________________________________________ >>> Postgres-xc-developers mailing list >>> Pos...@li... >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > > > -- > StormDB - http://www.stormdb.com > The Database Cloud |
From: Ashutosh B. <ash...@en...> - 2012-07-09 04:30:46
|
Hi Shankar, You have got answers to the prepared transaction problem, I guess. I have something else below. On Sat, Jul 7, 2012 at 1:44 AM, Shankar Hariharan < har...@ya...> wrote: > As planned I ran some tests using PGBench on this setup : > > Node 1 - Coord1, Datanode1, gtm-proxy1 > Node 2- Coord2, Datanode2, gtm-proxy2 > Node 3- Datanode3, gtm > > I was connecting via Coord1 for these tests: > - scale factor of 30 used > - tests run using the following input parameters for pgbench: > Try connecting to both the coordinators, it should give you better performance, esp, when you are using distributed tables. With distributed tables, coordinator gets involved in query execution more than that in the case of replicated tables. So, balancing load across two coordinators would help. > > Clients Threads Duration Transactions > 1 1 100 6204 > 2 2 100 9960 > 4 4 100 12880 > 6 6 100 1676 > > 8 > 8 8 100 19758 > 10 10 100 21944 > 12 12 100 20674 > > The run went well until the 8 clients. I started seeing errors on 10 > clients onwards and eventually the 14 client run has been hanging around > for over an hour now. The errors I have been seeing on console are the > following : > > pgbench console : > Client 8 aborted in state 12: ERROR: GTM error, could not obtain snapshot > Client 0 aborted in state 13: ERROR: maximum number of prepared > transactions reached > Client 7 aborted in state 13: ERROR: maximum number of prepared > transactions reached > Client 11 aborted in state 13: ERROR: maximum number of prepared > transactions reached > Client 9 aborted in state 13: ERROR: maximum number of prepared > transactions reached > > node console: > ERROR: GTM error, could not obtain snapshot > STATEMENT: INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) > VALUES (253, 26, 1888413, -817, CURRENT_TIMESTAMP); > ERROR: maximum number of prepared transactions reached > HINT: Increase max_prepared_transactions (currently 10). > STATEMENT: PREPARE TRANSACTION 'T201428' > ERROR: maximum number of prepared transactions reached > STATEMENT: END; > ERROR: maximum number of prepared transactions reached > STATEMENT: END; > ERROR: maximum number of prepared transactions reached > STATEMENT: END; > ERROR: maximum number of prepared transactions reached > STATEMENT: END; > ERROR: GTM error, could not obtain snapshot > STATEMENT: INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) > VALUES (140, 29, 2416403, -4192, CURRENT_TIMESTAMP); > > I was also watching the processes on each node and see the following for > the 14 client run: > > > Node1 : > postgres 25571 10511 0 04:41 ? 00:00:02 postgres: postgres > postgres ::1(33481) TRUNCATE TABLE waiting > postgres 25620 11694 0 04:46 ? 00:00:00 postgres: postgres > postgres pgbench-address (50388) TRUNCATE TABLE > > Node2: > postgres 10979 9631 0 Jul05 ? 00:00:42 postgres: postgres > postgres coord1-address(57357) idle in transaction > > Node3: > postgres 20264 9911 0 08:35 ? 00:00:05 postgres: postgres > postgres coord1-address(51406) TRUNCATE TABLE waiting > > > I was going to restart the processes on all nodes and start over but did > not want to lose this data as it could be useful information. > > Any explanation on the above issue is much appreciated. I will try the > next run with a higher value set for max_prepared_transactions. Any > recommendations for a good value on this front? > > thanks, > Shankar > > > ------------------------------ > *From:* Shankar Hariharan <har...@ya...> > *To:* Ashutosh Bapat <ash...@en...> > *Cc:* "pos...@li..." < > pos...@li...> > *Sent:* Friday, July 6, 2012 8:22 AM > > *Subject:* Re: [Postgres-xc-developers] Question on gtm-proxy > > Hi Ashutosh, > I was trying to size the load on a server and was wondering if a GTM > could be shared w/o much performance overhead between a small number of > datanodes and coordinators. I will post my findings here. > thanks, > Shankar > > ------------------------------ > *From:* Ashutosh Bapat <ash...@en...> > *To:* Shankar Hariharan <har...@ya...> > *Cc:* "pos...@li..." < > pos...@li...> > *Sent:* Friday, July 6, 2012 12:25 AM > *Subject:* Re: [Postgres-xc-developers] Question on gtm-proxy > > Hi Shankar, > Running gtm-proxy has shown to improve the performance, because it lessens > the load on GTM, by serving requests locally. Why do you want the > coordinators to connect directly to the GTM? Are you seeing any performance > improvement from doing that? > > On Fri, Jul 6, 2012 at 10:08 AM, Shankar Hariharan < > har...@ya...> wrote: > > Follow up to earlier email. In the setup described below, can I avoid > using a gtm-proxy? That is, can I just simply point coordinators to the one > gtm running on node 3 ? > My initial plan was to just run the gtm on node 3 then I thought I could > try a datanode without a local coordinator which was why I put these two > together on node 3. > thanks, > Shankar > > ------------------------------ > *From:* Shankar Hariharan <har...@ya...> > *To:* "pos...@li..." < > pos...@li...> > *Sent:* Thursday, July 5, 2012 11:35 PM > *Subject:* Question on multiple coordinators > > Hello, > > Am trying out XC 1.0 in the following configuraiton. > Node 1 - Coord1, Datanode1, gtm-proxy1 > Node 2- Coord2, Datanode2, gtm-proxy2 > Node 3- Datanode3, gtm > > I setup all nodes but forgot to add Coord1 to Coord2 and vice versa. In > addition I missed the pg_hba edit as well. So the first table T1 that I > created for distribution from Coord1 was not "visible| from Coord2 but > was on all the data nodes. > I tried to get Coord2 backinto business in various ways but the first > table I created refused to show up on Coord2 : > - edit pg_hba and add node on both coord1 and 2. Then run select > pgxc_pool_reload(); > - restart coord 1 and 2 > - drop node c2 from c1 and c1 from c2 and add them back followed by select > pgxc_pool_reload(); > > So I tried to create the same table T1 from Coord2 to observe behavior > and it did not like it clearly as all nodes it "wrote" to reported that the > table already existed which was good. At this point I could understand that > Coord2 and Coord1 are not talking alright so I created a new table from > coord1 with replication. This table was visible from both now. > > Question is should I expect to see the first table, let me call it T1 > after a while from Coord2 also? > > > thanks, > Shankar > > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > > > > > > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Michael P. <mic...@gm...> - 2012-07-09 04:19:47
|
If you want to have fun: http://sourceforge.net/search/index.php?group_id=311227&type_of_search=mlists&limit=25&q=redistribution On Mon, Jul 9, 2012 at 1:12 PM, Nikhil Sontakke <ni...@st...> wrote: > > Patch, design docs, and discussion threads are here, so it is just > necessary > > to follow the events. > > > > Umm, sorry but I was not able to find the design docs in the thread > here.. Can you post them again please? > > Regards, > Nikhils > -- > StormDB - http://www.stormdb.com > The Database Cloud > -- Michael Paquier http://michael.otacoo.com |
From: Nikhil S. <ni...@st...> - 2012-07-09 04:12:41
|
> Patch, design docs, and discussion threads are here, so it is just necessary > to follow the events. > Umm, sorry but I was not able to find the design docs in the thread here.. Can you post them again please? Regards, Nikhils -- StormDB - http://www.stormdb.com The Database Cloud |
From: Michael P. <mic...@gm...> - 2012-07-09 04:02:42
|
On Mon, Jul 9, 2012 at 12:46 PM, Nikhil Sontakke <ni...@st...>wrote: > > Hence, the maximum value of max_prepared_transactions with which you > will be > > sure that this error will not come out is the sum of the max_connections > of > > all the Coordinators of your cluster. However you are more or less sure > that > > you won't have a 2PC occurring on a Datanode at the same time for all the > > backends of Coordinators, so usually max_prepared_transactions could be > set > > sefely at 30%~40% of the sum of Coordinators' max_connections. Check with > > your application. > > > > Why is that 30-40% value safe? > This estimation is based on my long experience testing XC with several benchmarks. > Why are we sure that we wont have a 2PC occurring on a datanode at the > same time for the coordinator backends? > Possible, it depends on application. > > He is using PGBENCH which is tpcB and it's 100% updates. Now we don't > know how his tables are laid out, but I am guessing that it will > surely cause lots of 2PCs. So you are right, the value of > max_prepared_transactions on datanodes will have to consider typical > concurrent connections that might come from all the coordinator nodes. > As I would imagine that this pgbench is going to be slow on XC as there will be a lot of concurrent updates. So yeah, here you need perhaps a higher value, so I just mean that this setting depends on network, machines, cluster configuration (Co/Dn together/separated) and application used. -- Michael Paquier http://michael.otacoo.com |
From: Nikhil S. <ni...@st...> - 2012-07-09 03:58:11
|
Yeah, thanks for bringing out the networking (more important) aspect here. So it might help to have the GTM traffic happening on a different private network (separate from the client traffic) for performance. > Of course, GTM proxy reduces the number of GTM threads, which reduces > the chance of lock conflicts but I've not evaluated this yet. Also, > I've not evaluated how much GTM cpu is saved by GTM proxy yet. > I did some runs on a 4 node setup of mine and having proxies does seem to help in reducing the CPU overhead on the GTM node. I had not maxed out the GTM or anything though, but having proxies around seemed to help. Regards, Nikhils > Regards; > ---------- > Koichi Suzuki > > > 2012/7/7 Nikhil Sontakke <ni...@st...>: >> Hi Shankar, >> >> Yeah, the GTM might be able to scale a bit to some level, but after >> that having the proxies around on each node makes much more sense. It >> also helps reduce the direct CPU load on the GTM node. And the proxies >> shouldn't consume that much CPU by themselves too. Unless you are >> trying a CPU intensive benchmark, but most benchmarks try to churn up >> IO.. >> >> Regards, >> Nikhils >> >> On Fri, Jul 6, 2012 at 9:22 AM, Shankar Hariharan >> <har...@ya...> wrote: >>> Hi Ashutosh, >>> I was trying to size the load on a server and was wondering if a GTM could >>> be shared w/o much performance overhead between a small number of datanodes >>> and coordinators. I will post my findings here. >>> thanks, >>> Shankar >>> >>> ________________________________ >>> From: Ashutosh Bapat <ash...@en...> >>> To: Shankar Hariharan <har...@ya...> >>> Cc: "pos...@li..." >>> <pos...@li...> >>> Sent: Friday, July 6, 2012 12:25 AM >>> Subject: Re: [Postgres-xc-developers] Question on gtm-proxy >>> >>> Hi Shankar, >>> Running gtm-proxy has shown to improve the performance, because it lessens >>> the load on GTM, by serving requests locally. Why do you want the >>> coordinators to connect directly to the GTM? Are you seeing any performance >>> improvement from doing that? >>> >>> On Fri, Jul 6, 2012 at 10:08 AM, Shankar Hariharan >>> <har...@ya...> wrote: >>> >>> Follow up to earlier email. In the setup described below, can I avoid using >>> a gtm-proxy? That is, can I just simply point coordinators to the one gtm >>> running on node 3 ? >>> My initial plan was to just run the gtm on node 3 then I thought I could try >>> a datanode without a local coordinator which was why I put these two >>> together on node 3. >>> thanks, >>> Shankar >>> >>> ________________________________ >>> From: Shankar Hariharan <har...@ya...> >>> To: "pos...@li..." >>> <pos...@li...> >>> Sent: Thursday, July 5, 2012 11:35 PM >>> Subject: Question on multiple coordinators >>> >>> Hello, >>> >>> Am trying out XC 1.0 in the following configuraiton. >>> Node 1 - Coord1, Datanode1, gtm-proxy1 >>> Node 2- Coord2, Datanode2, gtm-proxy2 >>> Node 3- Datanode3, gtm >>> >>> I setup all nodes but forgot to add Coord1 to Coord2 and vice versa. In >>> addition I missed the pg_hba edit as well. So the first table T1 that I >>> created for distribution from Coord1 was not "visible| from Coord2 but was >>> on all the data nodes. >>> I tried to get Coord2 backinto business in various ways but the first table >>> I created refused to show up on Coord2 : >>> - edit pg_hba and add node on both coord1 and 2. Then run select >>> pgxc_pool_reload(); >>> - restart coord 1 and 2 >>> - drop node c2 from c1 and c1 from c2 and add them back followed by select >>> pgxc_pool_reload(); >>> >>> So I tried to create the same table T1 from Coord2 to observe behavior and >>> it did not like it clearly as all nodes it "wrote" to reported that the >>> table already existed which was good. At this point I could understand that >>> Coord2 and Coord1 are not talking alright so I created a new table from >>> coord1 with replication. This table was visible from both now. >>> >>> Question is should I expect to see the first table, let me call it T1 after >>> a while from Coord2 also? >>> >>> >>> thanks, >>> Shankar >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Live Security Virtual Conference >>> Exclusive live event will cover all the ways today's security and >>> threat landscape has changed and how IT managers can respond. Discussions >>> will include endpoint security, mobile security and the latest in malware >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> _______________________________________________ >>> Postgres-xc-developers mailing list >>> Pos...@li... >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>> >>> >>> >>> >>> -- >>> Best Wishes, >>> Ashutosh Bapat >>> EntepriseDB Corporation >>> The Enterprise Postgres Company >>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Live Security Virtual Conference >>> Exclusive live event will cover all the ways today's security and >>> threat landscape has changed and how IT managers can respond. Discussions >>> will include endpoint security, mobile security and the latest in malware >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> _______________________________________________ >>> Postgres-xc-developers mailing list >>> Pos...@li... >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>> >> >> >> >> -- >> StormDB - http://www.stormdb.com >> The Database Cloud >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Postgres-xc-developers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers -- StormDB - http://www.stormdb.com The Database Cloud |
From: Nikhil S. <ni...@st...> - 2012-07-09 03:46:49
|
> Hence, the maximum value of max_prepared_transactions with which you will be > sure that this error will not come out is the sum of the max_connections of > all the Coordinators of your cluster. However you are more or less sure that > you won't have a 2PC occurring on a Datanode at the same time for all the > backends of Coordinators, so usually max_prepared_transactions could be set > sefely at 30%~40% of the sum of Coordinators' max_connections. Check with > your application. > Why is that 30-40% value safe? Why are we sure that we wont have a 2PC occurring on a datanode at the same time for the coordinator backends? He is using PGBENCH which is tpcB and it's 100% updates. Now we don't know how his tables are laid out, but I am guessing that it will surely cause lots of 2PCs. So you are right, the value of max_prepared_transactions on datanodes will have to consider typical concurrent connections that might come from all the coordinator nodes. Regards, Nikhils > > On Sat, Jul 7, 2012 at 2:27 PM, Nikhil Sontakke <ni...@st...> wrote: >> >> > Any explanation on the above issue is much appreciated. I will try the >> > next >> > run with a higher value set for max_prepared_transactions. Any >> > recommendations for a good value on this front? >> > >> >> How many clients you want to run with this eventually? That will >> determine a decent value for max_prepared_transactions. Note that >> max_prepared_transactions takes a wee bit more of shared memory per >> prepared transaction. But it's ok to set it high proportionate to the >> max_connections value. >> >> Regards, >> Nikhils >> >> > thanks, >> > Shankar >> > >> > >> > ________________________________ >> > From: Shankar Hariharan <har...@ya...> >> > To: Ashutosh Bapat <ash...@en...> >> > Cc: "pos...@li..." >> > <pos...@li...> >> > Sent: Friday, July 6, 2012 8:22 AM >> > >> > Subject: Re: [Postgres-xc-developers] Question on gtm-proxy >> > >> > Hi Ashutosh, >> > I was trying to size the load on a server and was wondering if a GTM >> > could >> > be shared w/o much performance overhead between a small number of >> > datanodes >> > and coordinators. I will post my findings here. >> > thanks, >> > Shankar >> > >> > ________________________________ >> > From: Ashutosh Bapat <ash...@en...> >> > To: Shankar Hariharan <har...@ya...> >> > Cc: "pos...@li..." >> > <pos...@li...> >> > Sent: Friday, July 6, 2012 12:25 AM >> > Subject: Re: [Postgres-xc-developers] Question on gtm-proxy >> > >> > Hi Shankar, >> > Running gtm-proxy has shown to improve the performance, because it >> > lessens >> > the load on GTM, by serving requests locally. Why do you want the >> > coordinators to connect directly to the GTM? Are you seeing any >> > performance >> > improvement from doing that? >> > >> > On Fri, Jul 6, 2012 at 10:08 AM, Shankar Hariharan >> > <har...@ya...> wrote: >> > >> > Follow up to earlier email. In the setup described below, can I avoid >> > using >> > a gtm-proxy? That is, can I just simply point coordinators to the one >> > gtm >> > running on node 3 ? >> > My initial plan was to just run the gtm on node 3 then I thought I could >> > try >> > a datanode without a local coordinator which was why I put these two >> > together on node 3. >> > thanks, >> > Shankar >> > >> > ________________________________ >> > From: Shankar Hariharan <har...@ya...> >> > To: "pos...@li..." >> > <pos...@li...> >> > Sent: Thursday, July 5, 2012 11:35 PM >> > Subject: Question on multiple coordinators >> > >> > Hello, >> > >> > Am trying out XC 1.0 in the following configuraiton. >> > Node 1 - Coord1, Datanode1, gtm-proxy1 >> > Node 2- Coord2, Datanode2, gtm-proxy2 >> > Node 3- Datanode3, gtm >> > >> > I setup all nodes but forgot to add Coord1 to Coord2 and vice versa. In >> > addition I missed the pg_hba edit as well. So the first table T1 that I >> > created for distribution from Coord1 was not "visible| from Coord2 but >> > was >> > on all the data nodes. >> > I tried to get Coord2 backinto business in various ways but the first >> > table >> > I created refused to show up on Coord2 : >> > - edit pg_hba and add node on both coord1 and 2. Then run select >> > pgxc_pool_reload(); >> > - restart coord 1 and 2 >> > - drop node c2 from c1 and c1 from c2 and add them back followed by >> > select >> > pgxc_pool_reload(); >> > >> > So I tried to create the same table T1 from Coord2 to observe behavior >> > and >> > it did not like it clearly as all nodes it "wrote" to reported that the >> > table already existed which was good. At this point I could understand >> > that >> > Coord2 and Coord1 are not talking alright so I created a new table from >> > coord1 with replication. This table was visible from both now. >> > >> > Question is should I expect to see the first table, let me call it T1 >> > after >> > a while from Coord2 also? >> > >> > >> > thanks, >> > Shankar >> > >> > >> > >> > >> > ------------------------------------------------------------------------------ >> > Live Security Virtual Conference >> > Exclusive live event will cover all the ways today's security and >> > threat landscape has changed and how IT managers can respond. >> > Discussions >> > will include endpoint security, mobile security and the latest in >> > malware >> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> > _______________________________________________ >> > Postgres-xc-developers mailing list >> > Pos...@li... >> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> > >> > >> > >> > >> > -- >> > Best Wishes, >> > Ashutosh Bapat >> > EntepriseDB Corporation >> > The Enterprise Postgres Company >> > >> > >> > >> > >> > >> > >> > >> > ------------------------------------------------------------------------------ >> > Live Security Virtual Conference >> > Exclusive live event will cover all the ways today's security and >> > threat landscape has changed and how IT managers can respond. >> > Discussions >> > will include endpoint security, mobile security and the latest in >> > malware >> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> > _______________________________________________ >> > Postgres-xc-developers mailing list >> > Pos...@li... >> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> > >> >> >> >> -- >> StormDB - http://www.stormdb.com >> The Database Cloud >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Postgres-xc-developers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > > > > -- > Michael Paquier > http://michael.otacoo.com -- StormDB - http://www.stormdb.com The Database Cloud |
From: Michael P. <mic...@gm...> - 2012-07-09 02:09:36
|
Hi all, The first 2 patches I submitted to the MLs implement a global mechanism for redistribution. This mechanism might be slow and is really basic, but it has the following merits: - implementation of a generic grammar (refactoring of gram.y at the same time) - refactoring of XC-related CREATE TABLE code to use the same APIs with ALTER TABLE - addition of a tuplestore mechanism to be able to materialize data received from COPY protocol and send it back with COPY format to remote copy after dematerializing it. - COPY code related to XC refactored and classified under the banner of "remote COPY". This resulted in a total of 2500 lines. And I can already imagine you guys complaining about the performance which is not as good as using the RETURNING method embedded with DELETE proposed by Amit. However, I think we should commit those patches such as I have a good base to work on what I think is the next step of redistribution implementation. The method proposed by Amit is good, but only under certain circumstances and only for some redistribution conditions. And I think that this is only the visible side of the iceberg we see. 90% of the iceberg is still hidden to us. For the last couple of days, I have been thinking about some generic ideas for implementing a solid base for redistribution operations that would allow us to include as many optimizations as we want like the one of Amit. I noticed that redistribution is only a matter of repeated operations that may change depending on: 1) changing the distribution type of the table 2) reducing the nodes where data is located 3) increasing the nodes where data is located Let's take a couple of examples; 1) A replicated table where its set of nodes is reduced. What we need to do is only launching a TRUNCATE to the nodes where data is not located anymore. 2) A replicated table where its set of nodes is increased We need to fetch the data on Coordinator, and then send it the new nodes of the table 3) A table changed from replicated to distributed on the same set of nodes We only need to send a DELETE command to all the remote nodes to delete the necessary tuples 4) A table changed from distributed to replicated. We can use here the idea of AMIT, DELETE with RETURNING. 5) A table changed from distributed to distributed, with a modification of distribution column Worst case... But we can still use the idea of Amit, the DELETE with RETURNING. However now RETURNING is not supported, so for the time being we need to fallback to the basic method in cases 4 and 5. Those are not the only examples. Btw, my point is that data redistribution can be qualified as a succession of small and simple operations that need to be launched on a set of nodes. And the goal of this game here is to be able to identify the list of operations to be done depending on the redistribution operation done. Here is the point of my next implementation step: we need an algorithm that can classify all the operations necessary depending on the redistribution operation done (like a planner, but for redistribution), and launch them after (like an executor). Once this basic planner/executor architecture is put in place for table redistribution, it is easy to plug in new operation types which are optimizations, and will heavily reduce the implementation of new things (like the DELETE RETURNING tunning) and their maintenance. I am currently working on the implementation of this architecture, which will be a patch based on the 2 others I sent before. Then adding 1 by 1 the optimizations like in cases 1/2/3/4/5 could be simply done with single patches that launch those operations after analyzing the distribution operation. the addition of each optimization could also be done up to a certain point now (1/2/3 only?) and that will allow me to move to 9.2 merge. So here is my strategy. Comments? -- Michael Paquier http://michael.otacoo.com |
From: Michael P. <mic...@gm...> - 2012-07-09 01:07:58
|
Just to tell you guys that I fixed this issue. Here is the fix on master: https://github.com/postgres-xc/postgres-xc/commit/cc12ea8e67c46f9782804105915dcc90725a1f66 It has also been back-patched to 1.0 stable. Regards, On Sat, Jul 7, 2012 at 8:16 AM, Michael Paquier <mic...@gm...>wrote: > > On 2012/07/07, at 0:03, Shankar Hariharan <har...@ya...> > wrote: > > Michael, read your message on incorrect copy. I was wondering if it is > possible to both replicate and distribute a table. Just wondering if it can > be used to gather all distributed data in one spot to work around data loss. > > You cannot do yet distribution and replication of a table at the same time. > Well, there is no data loss in my bug. We just do not select the correct > node when running copy in those particular circumstances. > > > thanks, > Shankar > > ------------------------------ > *From:* "pos...@li..." < > pos...@li...> > *To:* pos...@li... > *Sent:* Friday, July 6, 2012 8:24 AM > *Subject:* Postgres-xc-developers Digest, Vol 25, Issue 10 > > Send Postgres-xc-developers mailing list submissions to > pos...@li... > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > or, via email, send a message with subject or body 'help' to > pos...@li... > > You can reach the person managing the list at > pos...@li... > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Postgres-xc-developers digest..." > > > Today's Topics: > > 1. Re: Question on gtm-proxy (Michael Paquier) > 2. Re: Question on gtm-proxy (Ashutosh Bapat) > 3. Incorrect COPY for partially replicated tables (Michael Paquier) > 4. Re: Question on gtm-proxy (Shankar Hariharan) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 6 Jul 2012 14:00:51 +0900 > From: Michael Paquier <mic...@gm...> > Subject: Re: [Postgres-xc-developers] Question on gtm-proxy > To: Shankar Hariharan <har...@ya...> > Cc: "pos...@li..." > <pos...@li...> > Message-ID: > <CAB...@ma...> > Content-Type: text/plain; charset="iso-8859-1" > > On Fri, Jul 6, 2012 at 1:38 PM, Shankar Hariharan < > har...@ya...> wrote: > > > Follow up to earlier email. In the setup described below, can I avoid > > using a gtm-proxy? That is, can I just simply point coordinators to the > one > > gtm running on node 3 ? > > My initial plan was to just run the gtm on node 3 then I thought I could > > try a datanode without a local coordinator which was why I put these two > > together on node 3. > > thanks, > > > GTM proxy is not a mandatory element in an XC cluster. > So yes, you can connect directly a Coordinator or a Datanode to a GTM. > -- > Michael Paquier > http://michael.otacoo.com > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 2 > Date: Fri, 6 Jul 2012 10:55:37 +0530 > From: Ashutosh Bapat <ash...@en...> > Subject: Re: [Postgres-xc-developers] Question on gtm-proxy > To: Shankar Hariharan <har...@ya...> > Cc: "pos...@li..." > <pos...@li...> > Message-ID: > <CAF...@ma...> > Content-Type: text/plain; charset="iso-8859-1" > > Hi Shankar, > Running gtm-proxy has shown to improve the performance, because it lessens > the load on GTM, by serving requests locally. Why do you want the > coordinators to connect directly to the GTM? Are you seeing any performance > improvement from doing that? > > On Fri, Jul 6, 2012 at 10:08 AM, Shankar Hariharan < > har...@ya...> wrote: > > > Follow up to earlier email. In the setup described below, can I avoid > > using a gtm-proxy? That is, can I just simply point coordinators to the > one > > gtm running on node 3 ? > > My initial plan was to just run the gtm on node 3 then I thought I could > > try a datanode without a local coordinator which was why I put these two > > together on node 3. > > thanks, > > Shankar > > > > ------------------------------ > > *From:* Shankar Hariharan <har...@ya...> > > *To:* "pos...@li..." < > > pos...@li...> > > *Sent:* Thursday, July 5, 2012 11:35 PM > > *Subject:* Question on multiple coordinators > > > > Hello, > > > > Am trying out XC 1.0 in the following configuraiton. > > Node 1 - Coord1, Datanode1, gtm-proxy1 > > Node 2- Coord2, Datanode2, gtm-proxy2 > > Node 3- Datanode3, gtm > > > > I setup all nodes but forgot to add Coord1 to Coord2 and vice versa. In > > addition I missed the pg_hba edit as well. So the first table T1 that I > > created for distribution from Coord1 was not "visible| from Coord2 but > > was on all the data nodes. > > I tried to get Coord2 backinto business in various ways but the first > > table I created refused to show up on Coord2 : > > - edit pg_hba and add node on both coord1 and 2. Then run select > > pgxc_pool_reload(); > > - restart coord 1 and 2 > > - drop node c2 from c1 and c1 from c2 and add them back followed by > select > > pgxc_pool_reload(); > > > > So I tried to create the same table T1 from Coord2 to observe behavior > > and it did not like it clearly as all nodes it "wrote" to reported that > the > > table already existed which was good. At this point I could understand > that > > Coord2 and Coord1 are not talking alright so I created a new table from > > coord1 with replication. This table was visible from both now. > > > > Question is should I expect to see the first table, let me call it T1 > > after a while from Coord2 also? > > > > > > thanks, > > Shankar > > > > > > > > > > > ------------------------------------------------------------------------------ > > Live Security Virtual Conference > > Exclusive live event will cover all the ways today's security and > > threat landscape has changed and how IT managers can respond. Discussions > > will include endpoint security, mobile security and the latest in malware > > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > _______________________________________________ > > Postgres-xc-developers mailing list > > Pos...@li... > > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > > > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 3 > Date: Fri, 6 Jul 2012 18:07:44 +0900 > From: Michael Paquier <mic...@gm...> > Subject: [Postgres-xc-developers] Incorrect COPY for partially > replicated tables > To: Postgres-XC mailing list > <pos...@li...> > Message-ID: > <CAB7nPqR=dA4=Hbw...@ma...> > Content-Type: text/plain; charset="iso-8859-1" > > Hi all, > > While testing data redistribution I found this bug with COPY. > It is reproducible with master, and very probably with 1.0 stable. > postgres=# create table aa (a int) distribute by replication to node dn2; > CREATE TABLE > postgres=# insert into aa values (generate_series(1,10)); > INSERT 0 10 > postgres=# copy aa to stdout; -- no output here > postgres=# > I'll investigate this problem on Monday. For the time being this bug is > registered here: > > https://sourceforge.net/tracker/?func=detail&aid=3540784&group_id=311227&atid=1310232 > > Regards, > -- > Michael Paquier > http://michael.otacoo.com > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 4 > Date: Fri, 6 Jul 2012 06:22:12 -0700 (PDT) > From: Shankar Hariharan <har...@ya...> > Subject: Re: [Postgres-xc-developers] Question on gtm-proxy > To: Ashutosh Bapat <ash...@en...> > Cc: "pos...@li..." > <pos...@li...> > Message-ID: > <134...@we...> > Content-Type: text/plain; charset="iso-8859-1" > > Hi Ashutosh, > I was trying to size the load on a server and was wondering if ?a GTM > could be shared w/o much performance overhead between a small number of > datanodes and coordinators. I will post my findings here. > thanks, > Shankar > > > ________________________________ > From: Ashutosh Bapat <ash...@en...> > To: Shankar Hariharan <har...@ya...> > Cc: "pos...@li..." < > pos...@li...> > Sent: Friday, July 6, 2012 12:25 AM > Subject: Re: [Postgres-xc-developers] Question on gtm-proxy > > > Hi Shankar, > Running gtm-proxy has shown to improve the performance, because it lessens > the load on GTM, by serving requests locally. Why do you want the > coordinators to connect directly to the GTM? Are you seeing any performance > improvement from doing that? > > > On Fri, Jul 6, 2012 at 10:08 AM, Shankar Hariharan < > har...@ya...> wrote: > > Follow up to earlier email. In the setup described below, can I avoid > using a gtm-proxy? That is, can I just simply point coordinators to the one > gtm running on node 3 ? > >My initial plan was to just run the gtm on node 3 then I thought I could > try a datanode without a local coordinator which was why I put these two > together on node 3. > >thanks, > >Shankar > > > > > > > >________________________________ > > From: Shankar Hariharan <har...@ya...> > >To: "pos...@li..." < > pos...@li...> > >Sent: Thursday, July 5, 2012 11:35 PM > >Subject: Question on multiple coordinators > > > > > >Hello, > > > > > >Am trying out XC 1.0 in the following configuraiton. > >Node 1 - Coord1, Datanode1, gtm-proxy1 > >Node 2-?Coord2, Datanode2, gtm-proxy2 > >Node 3-?Datanode3, gtm > > > > > >I setup all nodes but forgot to add Coord1 to?Coord2 and vice versa. In > addition I missed the pg_hba edit as well.?So the first table T1 that I > created for distribution from?Coord1?was not "visible| from?Coord2 but was > on all the data nodes.? > >I tried to get Coord2 backinto business in various ways but the first > table I created refused to show up on Coord2 : > >- edit pg_hba and add node on both coord1 and 2. Then run?select > pgxc_pool_reload(); > >- restart coord 1 and 2 > >- drop node c2 from c1 and c1 from c2 and add them back followed > by?select pgxc_pool_reload(); > > > > > >So I tried to create the same table T1 from?Coord2 to observe behavior > and it did not like it clearly as all nodes it "wrote" to reported that the > table already existed which was good. At this point I could understand that > Coord2 and Coord1 are not talking alright so I created a new table from > coord1 with replication. This table was visible from both now.? > > > > > >Question is should I expect to see the first table, let me call it T1 > after a while from Coord2 also?? > > > > > > > > > >thanks, > >Shankar > > > > > > >------------------------------------------------------------------------------ > >Live Security Virtual Conference > >Exclusive live event will cover all the ways today's security and > >threat landscape has changed and how IT managers can respond. Discussions > >will include endpoint security, mobile security and the latest in malware > >threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > >_______________________________________________ > >Postgres-xc-developers mailing list > >Pos...@li... > >https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > > > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > ------------------------------ > > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > > End of Postgres-xc-developers Digest, Vol 25, Issue 10 > ****************************************************** > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > -- Michael Paquier http://michael.otacoo.com |
From: Koichi S. <koi...@gm...> - 2012-07-09 00:28:49
|
The most overhead associated with GTM is the network communication. As you know, every transaction needs at least a GXID and a snapshot from GTM, which leads to enormous amount of network interactions, not necessary the size of data. If a transaction isolation mode is read committed, each statement needs separate snapshot, which adds much more network workload to GTM. GTM proxy improves this. GTM Proxy groups multiple requirement from backends to single message, reducing the number of interactions. Moreover, multiple snapshot requirement is reduced into single snapshot requirement. One snapshot from GTM is copied to multiple backends. This reduces size of the message. Of course, GTM proxy reduces the number of GTM threads, which reduces the chance of lock conflicts but I've not evaluated this yet. Also, I've not evaluated how much GTM cpu is saved by GTM proxy yet. Regards; ---------- Koichi Suzuki 2012/7/7 Nikhil Sontakke <ni...@st...>: > Hi Shankar, > > Yeah, the GTM might be able to scale a bit to some level, but after > that having the proxies around on each node makes much more sense. It > also helps reduce the direct CPU load on the GTM node. And the proxies > shouldn't consume that much CPU by themselves too. Unless you are > trying a CPU intensive benchmark, but most benchmarks try to churn up > IO.. > > Regards, > Nikhils > > On Fri, Jul 6, 2012 at 9:22 AM, Shankar Hariharan > <har...@ya...> wrote: >> Hi Ashutosh, >> I was trying to size the load on a server and was wondering if a GTM could >> be shared w/o much performance overhead between a small number of datanodes >> and coordinators. I will post my findings here. >> thanks, >> Shankar >> >> ________________________________ >> From: Ashutosh Bapat <ash...@en...> >> To: Shankar Hariharan <har...@ya...> >> Cc: "pos...@li..." >> <pos...@li...> >> Sent: Friday, July 6, 2012 12:25 AM >> Subject: Re: [Postgres-xc-developers] Question on gtm-proxy >> >> Hi Shankar, >> Running gtm-proxy has shown to improve the performance, because it lessens >> the load on GTM, by serving requests locally. Why do you want the >> coordinators to connect directly to the GTM? Are you seeing any performance >> improvement from doing that? >> >> On Fri, Jul 6, 2012 at 10:08 AM, Shankar Hariharan >> <har...@ya...> wrote: >> >> Follow up to earlier email. In the setup described below, can I avoid using >> a gtm-proxy? That is, can I just simply point coordinators to the one gtm >> running on node 3 ? >> My initial plan was to just run the gtm on node 3 then I thought I could try >> a datanode without a local coordinator which was why I put these two >> together on node 3. >> thanks, >> Shankar >> >> ________________________________ >> From: Shankar Hariharan <har...@ya...> >> To: "pos...@li..." >> <pos...@li...> >> Sent: Thursday, July 5, 2012 11:35 PM >> Subject: Question on multiple coordinators >> >> Hello, >> >> Am trying out XC 1.0 in the following configuraiton. >> Node 1 - Coord1, Datanode1, gtm-proxy1 >> Node 2- Coord2, Datanode2, gtm-proxy2 >> Node 3- Datanode3, gtm >> >> I setup all nodes but forgot to add Coord1 to Coord2 and vice versa. In >> addition I missed the pg_hba edit as well. So the first table T1 that I >> created for distribution from Coord1 was not "visible| from Coord2 but was >> on all the data nodes. >> I tried to get Coord2 backinto business in various ways but the first table >> I created refused to show up on Coord2 : >> - edit pg_hba and add node on both coord1 and 2. Then run select >> pgxc_pool_reload(); >> - restart coord 1 and 2 >> - drop node c2 from c1 and c1 from c2 and add them back followed by select >> pgxc_pool_reload(); >> >> So I tried to create the same table T1 from Coord2 to observe behavior and >> it did not like it clearly as all nodes it "wrote" to reported that the >> table already existed which was good. At this point I could understand that >> Coord2 and Coord1 are not talking alright so I created a new table from >> coord1 with replication. This table was visible from both now. >> >> Question is should I expect to see the first table, let me call it T1 after >> a while from Coord2 also? >> >> >> thanks, >> Shankar >> >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Postgres-xc-developers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >> >> >> >> -- >> Best Wishes, >> Ashutosh Bapat >> EntepriseDB Corporation >> The Enterprise Postgres Company >> >> >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Postgres-xc-developers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> > > > > -- > StormDB - http://www.stormdb.com > The Database Cloud > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers |
From: Michael P. <mic...@gm...> - 2012-07-08 23:36:40
|
Just giving some precision here. In XC we use an internal 2PC process at transaction commit if this transaction involves more than 2 nodes in a write operation (DML or DDL). So for each connection of your application at a Coordinator, you might finish with a 2PC transaction being run. Hence, the maximum value of max_prepared_transactions with which you will be sure that this error will not come out is the sum of the max_connections of all the Coordinators of your cluster. However you are more or less sure that you won't have a 2PC occurring on a Datanode at the same time for all the backends of Coordinators, so usually max_prepared_transactions could be set sefely at 30%~40% of the sum of Coordinators' max_connections. Check with your application. On Sat, Jul 7, 2012 at 2:27 PM, Nikhil Sontakke <ni...@st...> wrote: > > Any explanation on the above issue is much appreciated. I will try the > next > > run with a higher value set for max_prepared_transactions. Any > > recommendations for a good value on this front? > > > > How many clients you want to run with this eventually? That will > determine a decent value for max_prepared_transactions. Note that > max_prepared_transactions takes a wee bit more of shared memory per > prepared transaction. But it's ok to set it high proportionate to the > max_connections value. > > Regards, > Nikhils > > > thanks, > > Shankar > > > > > > ________________________________ > > From: Shankar Hariharan <har...@ya...> > > To: Ashutosh Bapat <ash...@en...> > > Cc: "pos...@li..." > > <pos...@li...> > > Sent: Friday, July 6, 2012 8:22 AM > > > > Subject: Re: [Postgres-xc-developers] Question on gtm-proxy > > > > Hi Ashutosh, > > I was trying to size the load on a server and was wondering if a GTM > could > > be shared w/o much performance overhead between a small number of > datanodes > > and coordinators. I will post my findings here. > > thanks, > > Shankar > > > > ________________________________ > > From: Ashutosh Bapat <ash...@en...> > > To: Shankar Hariharan <har...@ya...> > > Cc: "pos...@li..." > > <pos...@li...> > > Sent: Friday, July 6, 2012 12:25 AM > > Subject: Re: [Postgres-xc-developers] Question on gtm-proxy > > > > Hi Shankar, > > Running gtm-proxy has shown to improve the performance, because it > lessens > > the load on GTM, by serving requests locally. Why do you want the > > coordinators to connect directly to the GTM? Are you seeing any > performance > > improvement from doing that? > > > > On Fri, Jul 6, 2012 at 10:08 AM, Shankar Hariharan > > <har...@ya...> wrote: > > > > Follow up to earlier email. In the setup described below, can I avoid > using > > a gtm-proxy? That is, can I just simply point coordinators to the one gtm > > running on node 3 ? > > My initial plan was to just run the gtm on node 3 then I thought I could > try > > a datanode without a local coordinator which was why I put these two > > together on node 3. > > thanks, > > Shankar > > > > ________________________________ > > From: Shankar Hariharan <har...@ya...> > > To: "pos...@li..." > > <pos...@li...> > > Sent: Thursday, July 5, 2012 11:35 PM > > Subject: Question on multiple coordinators > > > > Hello, > > > > Am trying out XC 1.0 in the following configuraiton. > > Node 1 - Coord1, Datanode1, gtm-proxy1 > > Node 2- Coord2, Datanode2, gtm-proxy2 > > Node 3- Datanode3, gtm > > > > I setup all nodes but forgot to add Coord1 to Coord2 and vice versa. In > > addition I missed the pg_hba edit as well. So the first table T1 that I > > created for distribution from Coord1 was not "visible| from Coord2 but > was > > on all the data nodes. > > I tried to get Coord2 backinto business in various ways but the first > table > > I created refused to show up on Coord2 : > > - edit pg_hba and add node on both coord1 and 2. Then run select > > pgxc_pool_reload(); > > - restart coord 1 and 2 > > - drop node c2 from c1 and c1 from c2 and add them back followed by > select > > pgxc_pool_reload(); > > > > So I tried to create the same table T1 from Coord2 to observe behavior > and > > it did not like it clearly as all nodes it "wrote" to reported that the > > table already existed which was good. At this point I could understand > that > > Coord2 and Coord1 are not talking alright so I created a new table from > > coord1 with replication. This table was visible from both now. > > > > Question is should I expect to see the first table, let me call it T1 > after > > a while from Coord2 also? > > > > > > thanks, > > Shankar > > > > > > > > > ------------------------------------------------------------------------------ > > Live Security Virtual Conference > > Exclusive live event will cover all the ways today's security and > > threat landscape has changed and how IT managers can respond. Discussions > > will include endpoint security, mobile security and the latest in malware > > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > _______________________________________________ > > Postgres-xc-developers mailing list > > Pos...@li... > > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > > > > > > > > > -- > > Best Wishes, > > Ashutosh Bapat > > EntepriseDB Corporation > > The Enterprise Postgres Company > > > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > Live Security Virtual Conference > > Exclusive live event will cover all the ways today's security and > > threat landscape has changed and how IT managers can respond. Discussions > > will include endpoint security, mobile security and the latest in malware > > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > _______________________________________________ > > Postgres-xc-developers mailing list > > Pos...@li... > > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > > > > > -- > StormDB - http://www.stormdb.com > The Database Cloud > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > -- Michael Paquier http://michael.otacoo.com |
From: Mason S. <ma...@st...> - 2012-07-08 16:09:33
|
On Sat, Jul 7, 2012 at 7:52 PM, Shankar Hariharan <har...@ya...> wrote: > Thanks Nikhil. I have set both to 100 for my next run. I have another > question, if I create a table w/o specifying the distribution strategy i > still see that the data is distributed across the nodes. What is the default > distribution strategy? It tries to use the first column of a primary key or unique index, if specified in the CREATE TABLE statement, or the first column of a foreign key. If not available, it uses the first column with a reasonable data type (ie, not BYTEA, not BOOLEAN). > I did run some tests across 3 nodes and noticed that the data is not > distributed equally all times. For instance, when I first inserted 10 > records (all integer values) i noticed that data node 1 just got one record > into node 1 while the other two nodes were almost equal. However after 2 > more inserts of 10 records each all 3 nodes were almost at the same load > leve (w.r.t. number of records). Yes, I think there were just too few rows in your sample data set. As it gets big, it will even out. -- Mason Sharp StormDB - http://www.stormdb.com The Database Cloud |
From: Shankar H. <har...@ya...> - 2012-07-07 23:53:06
|
Thanks Nikhil. I have set both to 100 for my next run. I have another question, if I create a table w/o specifying the distribution strategy i still see that the data is distributed across the nodes. What is the default distribution strategy? I did run some tests across 3 nodes and noticed that the data is not distributed equally all times. For instance, when I first inserted 10 records (all integer values) i noticed that data node 1 just got one record into node 1 while the other two nodes were almost equal. However after 2 more inserts of 10 records each all 3 nodes were almost at the same load leve (w.r.t. number of records). Table used for test: Column | Type | Modifiers --------+---------+------------------------------------------------------- id | integer | not null default nextval('test_id_seq'::regclass) num | integer | thanks, Shankar ________________________________ From: Nikhil Sontakke <ni...@st...> To: Shankar Hariharan <har...@ya...> Cc: Ashutosh Bapat <ash...@en...>; "pos...@li..." <pos...@li...> Sent: Saturday, July 7, 2012 12:27 AM Subject: Re: [Postgres-xc-developers] Question on gtm-proxy > Any explanation on the above issue is much appreciated. I will try the next > run with a higher value set for max_prepared_transactions. Any > recommendations for a good value on this front? > How many clients you want to run with this eventually? That will determine a decent value for max_prepared_transactions. Note that max_prepared_transactions takes a wee bit more of shared memory per prepared transaction. But it's ok to set it high proportionate to the max_connections value. Regards, Nikhils > thanks, > Shankar > > > ________________________________ > From: Shankar Hariharan <har...@ya...> > To: Ashutosh Bapat <ash...@en...> > Cc: "pos...@li..." > <pos...@li...> > Sent: Friday, July 6, 2012 8:22 AM > > Subject: Re: [Postgres-xc-developers] Question on gtm-proxy > > Hi Ashutosh, > I was trying to size the load on a server and was wondering if a GTM could > be shared w/o much performance overhead between a small number of datanodes > and coordinators. I will post my findings here. > thanks, > Shankar > > ________________________________ > From: Ashutosh Bapat <ash...@en...> > To: Shankar Hariharan <har...@ya...> > Cc: "pos...@li..." > <pos...@li...> > Sent: Friday, July 6, 2012 12:25 AM > Subject: Re: [Postgres-xc-developers] Question on gtm-proxy > > Hi Shankar, > Running gtm-proxy has shown to improve the performance, because it lessens > the load on GTM, by serving requests locally. Why do you want the > coordinators to connect directly to the GTM? Are you seeing any performance > improvement from doing that? > > On Fri, Jul 6, 2012 at 10:08 AM, Shankar Hariharan > <har...@ya...> wrote: > > Follow up to earlier email. In the setup described below, can I avoid using > a gtm-proxy? That is, can I just simply point coordinators to the one gtm > running on node 3 ? > My initial plan was to just run the gtm on node 3 then I thought I could try > a datanode without a local coordinator which was why I put these two > together on node 3. > thanks, > Shankar > > ________________________________ > From: Shankar Hariharan <har...@ya...> > To: "pos...@li..." > <pos...@li...> > Sent: Thursday, July 5, 2012 11:35 PM > Subject: Question on multiple coordinators > > Hello, > > Am trying out XC 1.0 in the following configuraiton. > Node 1 - Coord1, Datanode1, gtm-proxy1 > Node 2- Coord2, Datanode2, gtm-proxy2 > Node 3- Datanode3, gtm > > I setup all nodes but forgot to add Coord1 to Coord2 and vice versa. In > addition I missed the pg_hba edit as well. So the first table T1 that I > created for distribution from Coord1 was not "visible| from Coord2 but was > on all the data nodes. > I tried to get Coord2 backinto business in various ways but the first table > I created refused to show up on Coord2 : > - edit pg_hba and add node on both coord1 and 2. Then run select > pgxc_pool_reload(); > - restart coord 1 and 2 > - drop node c2 from c1 and c1 from c2 and add them back followed by select > pgxc_pool_reload(); > > So I tried to create the same table T1 from Coord2 to observe behavior and > it did not like it clearly as all nodes it "wrote" to reported that the > table already existed which was good. At this point I could understand that > Coord2 and Coord1 are not talking alright so I created a new table from > coord1 with replication. This table was visible from both now. > > Question is should I expect to see the first table, let me call it T1 after > a while from Coord2 also? > > > thanks, > Shankar > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > > > > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > -- StormDB - http://www.stormdb.com The Database Cloud |
From: Nikhil S. <ni...@st...> - 2012-07-07 05:27:48
|
> Any explanation on the above issue is much appreciated. I will try the next > run with a higher value set for max_prepared_transactions. Any > recommendations for a good value on this front? > How many clients you want to run with this eventually? That will determine a decent value for max_prepared_transactions. Note that max_prepared_transactions takes a wee bit more of shared memory per prepared transaction. But it's ok to set it high proportionate to the max_connections value. Regards, Nikhils > thanks, > Shankar > > > ________________________________ > From: Shankar Hariharan <har...@ya...> > To: Ashutosh Bapat <ash...@en...> > Cc: "pos...@li..." > <pos...@li...> > Sent: Friday, July 6, 2012 8:22 AM > > Subject: Re: [Postgres-xc-developers] Question on gtm-proxy > > Hi Ashutosh, > I was trying to size the load on a server and was wondering if a GTM could > be shared w/o much performance overhead between a small number of datanodes > and coordinators. I will post my findings here. > thanks, > Shankar > > ________________________________ > From: Ashutosh Bapat <ash...@en...> > To: Shankar Hariharan <har...@ya...> > Cc: "pos...@li..." > <pos...@li...> > Sent: Friday, July 6, 2012 12:25 AM > Subject: Re: [Postgres-xc-developers] Question on gtm-proxy > > Hi Shankar, > Running gtm-proxy has shown to improve the performance, because it lessens > the load on GTM, by serving requests locally. Why do you want the > coordinators to connect directly to the GTM? Are you seeing any performance > improvement from doing that? > > On Fri, Jul 6, 2012 at 10:08 AM, Shankar Hariharan > <har...@ya...> wrote: > > Follow up to earlier email. In the setup described below, can I avoid using > a gtm-proxy? That is, can I just simply point coordinators to the one gtm > running on node 3 ? > My initial plan was to just run the gtm on node 3 then I thought I could try > a datanode without a local coordinator which was why I put these two > together on node 3. > thanks, > Shankar > > ________________________________ > From: Shankar Hariharan <har...@ya...> > To: "pos...@li..." > <pos...@li...> > Sent: Thursday, July 5, 2012 11:35 PM > Subject: Question on multiple coordinators > > Hello, > > Am trying out XC 1.0 in the following configuraiton. > Node 1 - Coord1, Datanode1, gtm-proxy1 > Node 2- Coord2, Datanode2, gtm-proxy2 > Node 3- Datanode3, gtm > > I setup all nodes but forgot to add Coord1 to Coord2 and vice versa. In > addition I missed the pg_hba edit as well. So the first table T1 that I > created for distribution from Coord1 was not "visible| from Coord2 but was > on all the data nodes. > I tried to get Coord2 backinto business in various ways but the first table > I created refused to show up on Coord2 : > - edit pg_hba and add node on both coord1 and 2. Then run select > pgxc_pool_reload(); > - restart coord 1 and 2 > - drop node c2 from c1 and c1 from c2 and add them back followed by select > pgxc_pool_reload(); > > So I tried to create the same table T1 from Coord2 to observe behavior and > it did not like it clearly as all nodes it "wrote" to reported that the > table already existed which was good. At this point I could understand that > Coord2 and Coord1 are not talking alright so I created a new table from > coord1 with replication. This table was visible from both now. > > Question is should I expect to see the first table, let me call it T1 after > a while from Coord2 also? > > > thanks, > Shankar > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > > > > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > -- StormDB - http://www.stormdb.com The Database Cloud |
From: Andrei M. <and...@gm...> - 2012-07-06 23:34:14
|
2012/7/6 Michael Paquier <mic...@gm...> > > On 2012/07/06, at 23:18, Andrei Martsinchyk <and...@gm...> > wrote: > > Hi Shankar, > > You can safely clone Coordinators by plain copying of the data directory > and adjusting configuration afterwards. That approach may be used to fix > your problem or to add a coordinator to existing cluster. > > 1. Stop all cluster components. > 2. Copy coordinator database to the new location. > 3. Start GTM, GTM proxy if appropriate and Coordinator specifying -D <new > datadir location>. Make sure you are not running coordinator on the master > copy of the data directory, they share the same name yet, GTM would not > allow that. > 4. Connect psql or other client to the coordinator and create a record in > pgxc_node table for future "self" entry using CREATE NODE command. You may > need to adjust connection info for current self, which will be pointing to > the original coordinator, since the view may be different from the new > location, like you may want to replace host='localhost' with host = '<IP > adress>' > 5. Adjust configuration of the new coordinator, you must change > pgxc_node_name so it is unique in the cluster; if the new location is on > the same box you may need to change port to listen on for client > connections and pooler port. The configuration should match the "self" > entry you created on previous step. > 6. Restart the new coordinator, start other cluster components. > 7. Connect to old coordinators and use CREATE NODE command to make them > aware of new coordinator. > 8. Enjoy. > > Pinpoint here is that we shouldn't have to stop cluster for a coordinator > addition. You need to protect your cluster from ddl intrusion while copying > catalog data to the new coordinator. > > Agree, but master coordinator has to be stopped when copy is started for the first time, otherwise GTM would not allow it to connect. Though there is a workaround with another temporary GTM for the copy, if it is not a production cluster it is simpler just to stop everything. > > 2012/7/6 Shankar Hariharan <har...@ya...> > >> Hello, >> >> Am trying out XC 1.0 in the following configuraiton. >> Node 1 - Coord1, Datanode1, gtm-proxy1 >> Node 2- Coord2, Datanode2, gtm-proxy2 >> Node 3- Datanode3, gtm >> >> I setup all nodes but forgot to add Coord1 to Coord2 and vice versa. In >> addition I missed the pg_hba edit as well. So the first table T1 that I >> created for distribution from Coord1 was not "visible| from Coord2 but >> was on all the data nodes. >> I tried to get Coord2 backinto business in various ways but the first >> table I created refused to show up on Coord2 : >> - edit pg_hba and add node on both coord1 and 2. Then run select >> pgxc_pool_reload(); >> - restart coord 1 and 2 >> - drop node c2 from c1 and c1 from c2 and add them back followed by select >> pgxc_pool_reload(); >> >> So I tried to create the same table T1 from Coord2 to observe behavior >> and it did not like it clearly as all nodes it "wrote" to reported that the >> table already existed which was good. At this point I could understand that >> Coord2 and Coord1 are not talking alright so I created a new table from >> coord1 with replication. This table was visible from both now. >> >> Question is should I expect to see the first table, let me call it T1 >> after a while from Coord2 also? >> >> >> thanks, >> Shankar >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Postgres-xc-developers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >> > > > -- > Andrei Martsinchyk > > StormDB - http://www.stormdb.com > The Database Cloud > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > -- Andrei Martsinchyk StormDB - http://www.stormdb.com The Database Cloud |
From: Michael P. <mic...@gm...> - 2012-07-06 23:19:08
|
On 2012/07/06, at 23:18, Andrei Martsinchyk <and...@gm...> wrote: > Hi Shankar, > > You can safely clone Coordinators by plain copying of the data directory and adjusting configuration afterwards. That approach may be used to fix your problem or to add a coordinator to existing cluster. > > 1. Stop all cluster components. > 2. Copy coordinator database to the new location. > 3. Start GTM, GTM proxy if appropriate and Coordinator specifying -D <new datadir location>. Make sure you are not running coordinator on the master copy of the data directory, they share the same name yet, GTM would not allow that. > 4. Connect psql or other client to the coordinator and create a record in pgxc_node table for future "self" entry using CREATE NODE command. You may need to adjust connection info for current self, which will be pointing to the original coordinator, since the view may be different from the new location, like you may want to replace host='localhost' with host = '<IP adress>' > 5. Adjust configuration of the new coordinator, you must change pgxc_node_name so it is unique in the cluster; if the new location is on the same box you may need to change port to listen on for client connections and pooler port. The configuration should match the "self" entry you created on previous step. > 6. Restart the new coordinator, start other cluster components. > 7. Connect to old coordinators and use CREATE NODE command to make them aware of new coordinator. > 8. Enjoy. Pinpoint here is that we shouldn't have to stop cluster for a coordinator addition. You need to protect your cluster from ddl intrusion while copying catalog data to the new coordinator. > > 2012/7/6 Shankar Hariharan <har...@ya...> > Hello, > > Am trying out XC 1.0 in the following configuraiton. > Node 1 - Coord1, Datanode1, gtm-proxy1 > Node 2- Coord2, Datanode2, gtm-proxy2 > Node 3- Datanode3, gtm > > I setup all nodes but forgot to add Coord1 to Coord2 and vice versa. In addition I missed the pg_hba edit as well. So the first table T1 that I created for distribution from Coord1 was not "visible| from Coord2 but was on all the data nodes. > I tried to get Coord2 backinto business in various ways but the first table I created refused to show up on Coord2 : > - edit pg_hba and add node on both coord1 and 2. Then run select pgxc_pool_reload(); > - restart coord 1 and 2 > - drop node c2 from c1 and c1 from c2 and add them back followed by select pgxc_pool_reload(); > > So I tried to create the same table T1 from Coord2 to observe behavior and it did not like it clearly as all nodes it "wrote" to reported that the table already existed which was good. At this point I could understand that Coord2 and Coord1 are not talking alright so I created a new table from coord1 with replication. This table was visible from both now. > > Question is should I expect to see the first table, let me call it T1 after a while from Coord2 also? > > > thanks, > Shankar > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > > > > -- > Andrei Martsinchyk > > StormDB - http://www.stormdb.com > The Database Cloud > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers |