You can subscribe to this list here.
2010 |
Jan
|
Feb
|
Mar
|
Apr
(10) |
May
(17) |
Jun
(3) |
Jul
|
Aug
|
Sep
(8) |
Oct
(18) |
Nov
(51) |
Dec
(74) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2011 |
Jan
(47) |
Feb
(44) |
Mar
(44) |
Apr
(102) |
May
(35) |
Jun
(25) |
Jul
(56) |
Aug
(69) |
Sep
(32) |
Oct
(37) |
Nov
(31) |
Dec
(16) |
2012 |
Jan
(34) |
Feb
(127) |
Mar
(218) |
Apr
(252) |
May
(80) |
Jun
(137) |
Jul
(205) |
Aug
(159) |
Sep
(35) |
Oct
(50) |
Nov
(82) |
Dec
(52) |
2013 |
Jan
(107) |
Feb
(159) |
Mar
(118) |
Apr
(163) |
May
(151) |
Jun
(89) |
Jul
(106) |
Aug
(177) |
Sep
(49) |
Oct
(63) |
Nov
(46) |
Dec
(7) |
2014 |
Jan
(65) |
Feb
(128) |
Mar
(40) |
Apr
(11) |
May
(4) |
Jun
(8) |
Jul
(16) |
Aug
(11) |
Sep
(4) |
Oct
(1) |
Nov
(5) |
Dec
(16) |
2015 |
Jan
(5) |
Feb
|
Mar
(2) |
Apr
(5) |
May
(4) |
Jun
(12) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(4) |
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
S | M | T | W | T | F | S |
---|---|---|---|---|---|---|
|
|
|
|
|
1
(12) |
2
(4) |
3
|
4
(17) |
5
(2) |
6
(5) |
7
(5) |
8
(23) |
9
|
10
(1) |
11
|
12
(2) |
13
|
14
|
15
|
16
|
17
|
18
(3) |
19
(1) |
20
(3) |
21
(10) |
22
(2) |
23
|
24
(1) |
25
(4) |
26
(8) |
27
(5) |
28
|
29
(3) |
30
(6) |
31
(1) |
|
|
|
|
|
|
From: Abbas B. <abb...@en...> - 2013-03-31 08:37:13
|
Hi, Attached please find the revised patch for restore mode. This patch has to be applied on top of the patches I sent earlier for 3608377, 3608376 & 3608375. I have also attached some scripts and a C file useful for testing the whole procedure. It is a database that has many objects in it. Here are the revised instructions for adding new nodes to the cluster. ====================================== Here are the steps to add a new coordinator 1) Initdb new coordinator /usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data_cord3 --nodename coord_3 2) Make necessary changes in its postgresql.conf, in particular specify new coordinator name and pooler port 3) Connect to any of the existing coordinators & lock the cluster for backup, do not close this session ./psql postgres -p 5432 select pgxc_lock_for_backup(); 4) Connect to any of the existing coordinators and take backup of the database ./pg_dumpall -p 5432 -s --include-nodes --dump-nodes --file=/home/edb/Desktop/NodeAddition/revised_patches/misc_dumps/1100_all_objects_coord.sql 5) Start the new coordinator specify --restoremode while starting the coordinator ./postgres --restoremode -D ../data_cord3 -p 5455 6) Create the new database on the new coordinator - optional ./createdb test -p 5455 7) Restore the backup that was taken from an existing coordinator by connecting to the new coordinator directly ./psql -d test -f /home/edb/Desktop/NodeAddition/revised_patches/misc_dumps/1100_all_objects_coord.sql -p 5455 8) Quit the new coordinator 9) Start the new coordinator as a by specifying --coordinator ./postgres --coordinator -D ../data_cord3 -p 5455 10) Create the new coordinator on rest of the coordinators and reload configuration CREATE NODE COORD_3 WITH (HOST = 'localhost', type = 'coordinator', PORT = 5455); SELECT pgxc_pool_reload(); 11) Quit the session of step 3, this will unlock the cluster 12) The new coordinator is now ready ./psql test -p 5455 create table test_new_coord(a int, b int); \q ./psql test -p 5432 select * from test_new_coord; *======================================* *======================================* Here are the steps to add a new datanode 1) Initdb new datanode /usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data3 --nodename data_node_3 2) Make necessary changes in its postgresql.conf, in particular specify new datanode name 3) Connect to any of the existing coordinators & lock the cluster for backup, do not close this session ./psql postgres -p 5432 select pgxc_lock_for_backup(); 4) Connect to any of the existing datanodes and take backup of the database ./pg_dumpall -p 15432 -s --include-nodes --file=/home/edb/Desktop/NodeAddition/revised_patches/misc_dumps/1122_all_objects_dn1.sql 5) Start the new datanode specify --restoremode while starting the it ./postgres --restoremode -D ../data3 -p 35432 6) Restore the backup that was taken from an existing datanode by connecting to the new datanode directly ./psql -d postgres -f /home/edb/Desktop/NodeAddition/revised_patches/misc_dumps/1122_all_objects_dn1.sql -p 35432 7) Quit the new datanode 8) Start the new datanode as a datanode by specifying --datanode ./postgres --datanode -D ../data3 -p 35432 9) Create the new datanode on all the coordinators and reload configuration CREATE NODE DATA_NODE_3 WITH (HOST = 'localhost', type = 'datanode', PORT = 35432); SELECT pgxc_pool_reload(); 10) Quit the session of step 3, this will unlock the cluster 11) Redistribute data by using ALTER TABLE REDISTRIBUTE 12) The new daatnode is now ready ./psql test create table test_new_dn(a int, b int) distribute by replication; insert into test_new_dn values(1,2); EXECUTE DIRECT ON (data_node_1) 'SELECT * from test_new_dn'; EXECUTE DIRECT ON (data_node_2) 'SELECT * from test_new_dn'; EXECUTE DIRECT ON (data_node_3) 'SELECT * from test_new_dn'; ====================================== On Wed, Mar 27, 2013 at 5:02 PM, Abbas Butt <abb...@en...>wrote: > Feature ID 3608379 > > On Fri, Mar 1, 2013 at 5:48 PM, Amit Khandekar < > ami...@en...> wrote: > >> On 1 March 2013 01:30, Abbas Butt <abb...@en...> wrote: >> > >> > >> > On Thu, Feb 28, 2013 at 12:44 PM, Amit Khandekar >> > <ami...@en...> wrote: >> >> >> >> >> >> >> >> On 28 February 2013 10:23, Abbas Butt <abb...@en...> >> wrote: >> >>> >> >>> Hi All, >> >>> >> >>> Attached please find a patch that provides a new command line argument >> >>> for postgres called --restoremode. >> >>> >> >>> While adding a new node to the cluster we need to restore the schema >> of >> >>> existing database to the new node. >> >>> If the new node is a datanode and we connect directly to it, it does >> not >> >>> allow DDL, because it is in read only mode & >> >>> If the new node is a coordinator, it will send DDLs to all the other >> >>> coordinators which we do not want it to do. >> >> >> >> >> >> What if we allow writes in standalone mode, so that we would initialize >> >> the new node using standalone mode instead of --restoremode ? >> > >> > >> > Please take a look at the patch, I am using --restoremode in place of >> > --coordinator & --datanode. I am not sure how would stand alone mode >> fit in >> > here. >> >> I was trying to see if we can avoid adding a new mode, instead, use >> standalone mode for all the purposes for which restoremode is used. >> Actually I checked the documentation, it says this mode is used only >> for debugging or recovery purposes, so now I myself am a bit hesitent >> about this mode for the purpose of restoring. >> >> > >> >> >> >> >> >>> >> >>> To provide ability to restore on the new node a new command line >> argument >> >>> is provided. >> >>> It is to be provided in place of --coordinator OR --datanode. >> >>> In restore mode both coordinator and datanode are internally treated >> as a >> >>> datanode. >> >>> For more details see patch comments. >> >>> >> >>> After this patch one can add a new node to the cluster. >> >>> >> >>> Here are the steps to add a new coordinator >> >>> >> >>> >> >>> 1) Initdb new coordinator >> >>> /usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data_cord3 >> >>> --nodename coord_3 >> >>> >> >>> 2) Make necessary changes in its postgresql.conf, in particular >> specify >> >>> new coordinator name and pooler port >> >>> >> >>> 3) Connect to any of the existing coordinators & lock the cluster for >> >>> backup >> >>> ./psql postgres -p 5432 >> >>> SET xc_lock_for_backup=yes; >> >>> \q >> >> >> >> >> >> I haven't given a thought on the earlier patch you sent for cluster >> lock >> >> implementation; may be we can discuss this on that thread, but just a >> quick >> >> question: >> >> >> >> Does the cluster-lock command wait for the ongoing DDL commands to >> finish >> >> ? If not, we have problems. The subsequent pg_dump would not contain >> objects >> >> created by these particular DDLs. >> > >> > >> > Suppose you have a two coordinator cluster. Assume one client connected >> to >> > each. Suppose one client issues a lock cluster command and the other >> issues >> > a DDL. Is this what you mean by an ongoing DDL? If true then answer to >> your >> > question is Yes. >> > >> > Suppose you have a prepared transaction that has a DDL in it, again if >> this >> > can be considered an on going DDL, then again answer to your question is >> > Yes. >> > >> > Suppose you have a two coordinator cluster. Assume one client connected >> to >> > each. One client starts a transaction and issues a DDL, the second >> client >> > issues a lock cluster command, the first commits the transaction. If >> this is >> > an ongoing DDL, then the answer to your question is No. But its a >> matter of >> > deciding which camp are we going to put COMMIT in, the allow camp, or >> the >> > deny camp. I decided to put it in allow camp, because I have not yet >> written >> > any code to detect whether a transaction being committed has a DDL in >> it or >> > not, and stopping all transactions from committing looks too >> restrictive to >> > me. >> > >> > Do you have some other meaning of an ongoing DDL? >> > >> > I agree that we should have discussed this on the right thread. Lets >> > continue this discussion on that thread. >> >> Continued on the other thread. >> >> > >> >> >> >> >> >>> >> >>> >> >>> 4) Connect to any of the existing coordinators and take backup of the >> >>> database >> >>> ./pg_dump -p 5432 -C -s >> >>> --file=/home/edb/Desktop/NodeAddition/dumps/101_all_objects_coord.sql >> test >> >>> >> >>> 5) Start the new coordinator specify --restoremode while starting the >> >>> coordinator >> >>> ./postgres --restoremode -D ../data_cord3 -p 5455 >> >>> >> >>> 6) connect to the new coordinator directly >> >>> ./psql postgres -p 5455 >> >>> >> >>> 7) create all the datanodes and the rest of the coordinators on the >> new >> >>> coordiantor & reload configuration >> >>> CREATE NODE DATA_NODE_1 WITH (HOST = 'localhost', type = >> >>> 'datanode', PORT = 15432, PRIMARY); >> >>> CREATE NODE DATA_NODE_2 WITH (HOST = 'localhost', type = >> >>> 'datanode', PORT = 25432); >> >>> >> >>> CREATE NODE COORD_1 WITH (HOST = 'localhost', type = >> >>> 'coordinator', PORT = 5432); >> >>> CREATE NODE COORD_2 WITH (HOST = 'localhost', type = >> >>> 'coordinator', PORT = 5433); >> >>> >> >>> SELECT pgxc_pool_reload(); >> >>> >> >>> 8) quit psql >> >>> >> >>> 9) Create the new database on the new coordinator >> >>> ./createdb test -p 5455 >> >>> >> >>> 10) create the roles and table spaces manually, the dump does not >> contain >> >>> roles or table spaces >> >>> ./psql test -p 5455 >> >>> CREATE ROLE admin WITH LOGIN CREATEDB CREATEROLE; >> >>> CREATE TABLESPACE my_space LOCATION >> >>> '/usr/local/pgsql/my_space_location'; >> >>> \q >> >>> >> >> >> >> Will pg_dumpall help ? It dumps roles also. >> > >> > >> > Yah , but I am giving example of pg_dump so this step has to be there. >> > >> >> >> >> >> >> >> >>> >> >>> 11) Restore the backup that was taken from an existing coordinator by >> >>> connecting to the new coordinator directly >> >>> ./psql -d test -f >> >>> /home/edb/Desktop/NodeAddition/dumps/101_all_objects_coord.sql -p 5455 >> >>> >> >>> 11) Quit the new coordinator >> >>> >> >>> 12) Connect to any of the existing coordinators & unlock the cluster >> >>> ./psql postgres -p 5432 >> >>> SET xc_lock_for_backup=no; >> >>> \q >> >>> >> >> >> >> Unlocking the cluster has to be done *after* the node is added into the >> >> cluster. >> > >> > >> > Very true. I stand corrected. This means CREATE NODE has to be allowed >> when >> > xc_lock_for_backup is set. >> > >> >> >> >> >> >> >> >>> >> >>> 13) Start the new coordinator as a by specifying --coordinator >> >>> ./postgres --coordinator -D ../data_cord3 -p 5455 >> >>> >> >>> 14) Create the new coordinator on rest of the coordinators and reload >> >>> configuration >> >>> CREATE NODE COORD_3 WITH (HOST = 'localhost', type = >> >>> 'coordinator', PORT = 5455); >> >>> SELECT pgxc_pool_reload(); >> >>> >> >>> 15) The new coordinator is now ready >> >>> ./psql test -p 5455 >> >>> create table test_new_coord(a int, b int); >> >>> \q >> >>> ./psql test -p 5432 >> >>> select * from test_new_coord; >> >>> >> >>> >> >>> Here are the steps to add a new datanode >> >>> >> >>> >> >>> 1) Initdb new datanode >> >>> /usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data3 >> --nodename >> >>> data_node_3 >> >>> >> >>> 2) Make necessary changes in its postgresql.conf, in particular >> specify >> >>> new datanode name >> >>> >> >>> 3) Connect to any of the existing coordinators & lock the cluster for >> >>> backup >> >>> ./psql postgres -p 5432 >> >>> SET xc_lock_for_backup=yes; >> >>> \q >> >>> >> >>> 4) Connect to any of the existing datanodes and take backup of the >> >>> database >> >>> ./pg_dump -p 15432 -C -s >> >>> --file=/home/edb/Desktop/NodeAddition/dumps/102_all_objects_dn1.sql >> test >> >>> >> >>> 5) Start the new datanode specify --restoremode while starting the it >> >>> ./postgres --restoremode -D ../data3 -p 35432 >> >>> >> >>> 6) Create the new database on the new datanode >> >>> ./createdb test -p 35432 >> >>> >> >>> 7) create the roles and table spaces manually, the dump does not >> contain >> >>> roles or table spaces >> >>> ./psql test -p 35432 >> >>> CREATE ROLE admin WITH LOGIN CREATEDB CREATEROLE; >> >>> CREATE TABLESPACE my_space LOCATION >> >>> '/usr/local/pgsql/my_space_location'; >> >>> \q >> >>> >> >>> 8) Restore the backup that was taken from an existing datanode by >> >>> connecting to the new datanode directly >> >>> ./psql -d test -f >> >>> /home/edb/Desktop/NodeAddition/dumps/102_all_objects_dn1.sql -p 35432 >> >>> >> >>> 9) Quit the new datanode >> >>> >> >>> 10) Connect to any of the existing coordinators & unlock the cluster >> >>> ./psql postgres -p 5432 >> >>> SET xc_lock_for_backup=no; >> >>> \q >> >>> >> >>> 11) Start the new datanode as a datanode by specifying --datanode >> >>> ./postgres --datanode -D ../data3 -p 35432 >> >>> >> >>> 12) Create the new datanode on all the coordinators and reload >> >>> configuration >> >>> CREATE NODE DATA_NODE_3 WITH (HOST = 'localhost', type = >> >>> 'datanode', PORT = 35432); >> >>> SELECT pgxc_pool_reload(); >> >>> >> >>> 13) Redistribute data by using ALTER TABLE REDISTRIBUTE >> >>> >> >>> 14) The new daatnode is now ready >> >>> ./psql test >> >>> create table test_new_dn(a int, b int) distribute by >> replication; >> >>> insert into test_new_dn values(1,2); >> >>> EXECUTE DIRECT ON (data_node_1) 'SELECT * from test_new_dn'; >> >>> EXECUTE DIRECT ON (data_node_2) 'SELECT * from test_new_dn'; >> >>> EXECUTE DIRECT ON (data_node_3) 'SELECT * from test_new_dn'; >> >>> >> >>> Please note that the steps assume that the patch sent earlier >> >>> 1_lock_cluster.patch in mail subject [Patch to lock cluster] is >> applied. >> >>> >> >>> I have also attached test database scripts, that would help in patch >> >>> review. >> >>> >> >>> Comments are welcome. >> >>> >> >>> -- >> >>> Abbas >> >>> Architect >> >>> EnterpriseDB Corporation >> >>> The Enterprise PostgreSQL Company >> >>> >> >>> Phone: 92-334-5100153 >> >>> >> >>> Website: www.enterprisedb.com >> >>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> >>> Follow us on Twitter: http://www.twitter.com/enterprisedb >> >>> >> >>> This e-mail message (and any attachment) is intended for the use of >> >>> the individual or entity to whom it is addressed. This message >> >>> contains information from EnterpriseDB Corporation that may be >> >>> privileged, confidential, or exempt from disclosure under applicable >> >>> law. If you are not the intended recipient or authorized to receive >> >>> this for the intended recipient, any use, dissemination, distribution, >> >>> retention, archiving, or copying of this communication is strictly >> >>> prohibited. If you have received this e-mail in error, please notify >> >>> the sender immediately by reply e-mail and delete this message. >> >>> >> >>> >> ------------------------------------------------------------------------------ >> >>> Everyone hates slow websites. So do we. >> >>> Make your web apps faster with AppDynamics >> >>> Download AppDynamics Lite for free today: >> >>> http://p.sf.net/sfu/appdyn_d2d_feb >> >>> _______________________________________________ >> >>> Postgres-xc-developers mailing list >> >>> Pos...@li... >> >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >>> >> >> >> > >> > >> > >> > -- >> > -- >> > Abbas >> > Architect >> > EnterpriseDB Corporation >> > The Enterprise PostgreSQL Company >> > >> > Phone: 92-334-5100153 >> > >> > Website: www.enterprisedb.com >> > EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> > Follow us on Twitter: http://www.twitter.com/enterprisedb >> > >> > This e-mail message (and any attachment) is intended for the use of >> > the individual or entity to whom it is addressed. This message >> > contains information from EnterpriseDB Corporation that may be >> > privileged, confidential, or exempt from disclosure under applicable >> > law. If you are not the intended recipient or authorized to receive >> > this for the intended recipient, any use, dissemination, distribution, >> > retention, archiving, or copying of this communication is strictly >> > prohibited. If you have received this e-mail in error, please notify >> > the sender immediately by reply e-mail and delete this message. >> > > > > -- > -- > Abbas > Architect > EnterpriseDB Corporation > The Enterprise PostgreSQL Company > > Phone: 92-334-5100153 > > Website: www.enterprisedb.com > EnterpriseDB Blog: http://blogs.enterprisedb.com/ > Follow us on Twitter: http://www.twitter.com/enterprisedb > > This e-mail message (and any attachment) is intended for the use of > the individual or entity to whom it is addressed. This message > contains information from EnterpriseDB Corporation that may be > privileged, confidential, or exempt from disclosure under applicable > law. If you are not the intended recipient or authorized to receive > this for the intended recipient, any use, dissemination, distribution, > retention, archiving, or copying of this communication is strictly > prohibited. If you have received this e-mail in error, please notify > the sender immediately by reply e-mail and delete this message. -- -- Abbas Architect EnterpriseDB Corporation The Enterprise PostgreSQL Company Phone: 92-334-5100153 Website: www.enterprisedb.com EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Kaiji C. <ch...@im...> - 2013-03-30 12:35:26
|
Thanks for your help, I'll use this solution in my project. On Mar 30, 2013, at 12:40 PM, "Abbas Butt" <abb...@en...<mailto:abb...@en...>> wrote: On Fri, Mar 29, 2013 at 3:19 PM, Kaiji Chen <ch...@im...<mailto:ch...@im...>> wrote: Hi, I'm working on a data partitioning project on PostgreSQL by adding a middleware between the database cluster interface and applications that modify the SQL statement to specific data nodes. I just find that PostgresXC has a nice GTM that can help me do the distributed transaction management works, I considered to transfer my project on it. It seems the sliders (http://wiki.postgresql.org/images/f/f6/PGXC_Scalability_PGOpen2012.pdf) intend that user defined table distribution is not available, but the coordinator can choose specific data node when processing the queries, and the table will be distributed to by default if DISTRIBUTED BY is not specified. Then I wonder if I can specify a data node in each query and stop the default auto distributing process. Here is what you can do. Add a column of type int in the table and distribute the table by modulo of the added column. Now if you want to specify in your query that the insert should go to first data node use value 0 for the added column, for second data node use 1 and so on. Off course a better way would be a add support for a user defined function for computing target data node in XC, but the above idea is valid for the current implementation. ------------------------------------------------------------------------------ Own the Future-Intel(R) Level Up Game Demo Contest 2013 Rise to greatness in Intel's independent game demo contest. Compete for recognition, cash, and the chance to get your game on Steam. $5K grand prize plus 10 genre and skill prizes. Submit your demo by 6/6/13. http://altfarm.mediaplex.com/ad/ck/12124-176961-30367-2 _______________________________________________ Postgres-xc-developers mailing list Pos...@li...<mailto:Pos...@li...> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers -- -- Abbas Architect EnterpriseDB Corporation The Enterprise PostgreSQL Company Phone: 92-334-5100153 Website: www.enterprisedb.com<http://www.enterprisedb.com/> EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Abbas B. <abb...@en...> - 2013-03-30 11:40:33
|
On Fri, Mar 29, 2013 at 3:19 PM, Kaiji Chen <ch...@im...> wrote: > Hi, > I'm working on a data partitioning project on PostgreSQL by adding a > middleware between the database cluster interface and applications that > modify the SQL statement to specific data nodes. I just find that > PostgresXC has a nice GTM that can help me do the distributed transaction > management works, I considered to transfer my project on it. > It seems the sliders ( > http://wiki.postgresql.org/images/f/f6/PGXC_Scalability_PGOpen2012.pdf) > intend that user defined table distribution is not available, but the > coordinator can choose specific data node when processing the queries, and > the table will be distributed to by default if DISTRIBUTED BY is not > specified. Then I wonder if I can specify a data node in each query and > stop the default auto distributing process. > Here is what you can do. Add a column of type int in the table and distribute the table by modulo of the added column. Now if you want to specify in your query that the insert should go to first data node use value 0 for the added column, for second data node use 1 and so on. Off course a better way would be a add support for a user defined function for computing target data node in XC, but the above idea is valid for the current implementation. > > > ------------------------------------------------------------------------------ > Own the Future-Intel(R) Level Up Game Demo Contest 2013 > Rise to greatness in Intel's independent game demo contest. Compete > for recognition, cash, and the chance to get your game on Steam. > $5K grand prize plus 10 genre and skill prizes. Submit your demo > by 6/6/13. http://altfarm.mediaplex.com/ad/ck/12124-176961-30367-2 > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > -- -- Abbas Architect EnterpriseDB Corporation The Enterprise PostgreSQL Company Phone: 92-334-5100153 Website: www.enterprisedb.com EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Kaiji C. <ch...@im...> - 2013-03-30 09:12:24
|
Thanks for your reply! It seems ok if I use EXECUTE DIRECT and manually maintain the data concurrency and a global index in my middleware. But it looks like I've skipped the PostgresXC coordinator, it will not be the best choice. I just come up a idea applying external data partitioning design to the PostgresXC. As stated in the document XC can distribute tables to data nodes using hash function. Then can I manipulate the original table and add a new column as my partition decision and let the table distributed by this column. Then we add this column to the compound primary key of the table and let the coordinator deal with the query planning work. I think this can be done if for different tables, the same hash value will be partitioned to the same data node if there is no modification to the set of data nodes. Yours, Kaiji Chen PhD Candidate<mailto:ch...@im...> IMADA, Southern Denmark University Email: ch...@im...<mailto:ch...@im...> ________________________________ From: Michael Paquier [mic...@gm...] Sent: Saturday, March 30, 2013 5:55 AM To: Kaiji Chen Cc: pos...@li... Subject: Re: [Postgres-xc-developers] Manually Table Partitioning On Fri, Mar 29, 2013 at 7:19 PM, Kaiji Chen <ch...@im...<mailto:ch...@im...>> wrote: Hi, I'm working on a data partitioning project on PostgreSQL by adding a middleware between the database cluster interface and applications that modify the SQL statement to specific data nodes. I just find that PostgresXC has a nice GTM that can help me do the distributed transaction management works, I considered to transfer my project on it. It seems the sliders (http://wiki.postgresql.org/images/f/f6/PGXC_Scalability_PGOpen2012.pdf) intend that user defined table distribution is not available, but the coordinator can choose specific data node when processing the queries, and the table will be distributed to by default if DISTRIBUTED BY is not specified. Then I wonder if I can specify a data node in each query and stop the default auto distributing process. For SELECT queries, you can use EXECUTE DIRECT: http://postgres-xc.sourceforge.net/docs/1_0_2/sql-executedirect.html The results you get might not be exact as not global query planning is not done and the query string is sent as-is. Note that you cannot use EXECUTE DIRECT with DML or the whole cluster consistency would be broken. -- Michael |
From: Michael P. <mic...@gm...> - 2013-03-30 04:55:12
|
On Fri, Mar 29, 2013 at 7:19 PM, Kaiji Chen <ch...@im...> wrote: > Hi, > I'm working on a data partitioning project on PostgreSQL by adding a > middleware between the database cluster interface and applications that > modify the SQL statement to specific data nodes. I just find that > PostgresXC has a nice GTM that can help me do the distributed transaction > management works, I considered to transfer my project on it. > It seems the sliders ( > http://wiki.postgresql.org/images/f/f6/PGXC_Scalability_PGOpen2012.pdf) > intend that user defined table distribution is not available, but the > coordinator can choose specific data node when processing the queries, and > the table will be distributed to by default if DISTRIBUTED BY is not > specified. Then I wonder if I can specify a data node in each query and > stop the default auto distributing process. > For SELECT queries, you can use EXECUTE DIRECT: http://postgres-xc.sourceforge.net/docs/1_0_2/sql-executedirect.html The results you get might not be exact as not global query planning is not done and the query string is sent as-is. Note that you cannot use EXECUTE DIRECT with DML or the whole cluster consistency would be broken. -- Michael |
From: Abbas B. <abb...@en...> - 2013-03-29 12:33:09
|
Hi, Attached please find a revised patch that provides support in pg_dumpall to dump nodes and node groups if the command line option --dump-nodes is provided. I tested and found that pg_dumpall works as expected. On Wed, Mar 27, 2013 at 5:04 PM, Abbas Butt <abb...@en...>wrote: > Feature ID 3608376 > > On Sun, Mar 10, 2013 at 7:59 PM, Abbas Butt <abb...@en...>wrote: > >> Hi, >> Attached please find a patch that adds support in pg_dump to dump nodes >> and node groups. This is required while adding a new node to the cluster. >> >> -- >> Abbas >> Architect >> EnterpriseDB Corporation >> The Enterprise PostgreSQL Company >> >> Phone: 92-334-5100153 >> >> Website: www.enterprisedb.com >> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> Follow us on Twitter: http://www.twitter.com/enterprisedb >> >> This e-mail message (and any attachment) is intended for the use of >> the individual or entity to whom it is addressed. This message >> contains information from EnterpriseDB Corporation that may be >> privileged, confidential, or exempt from disclosure under applicable >> law. If you are not the intended recipient or authorized to receive >> this for the intended recipient, any use, dissemination, distribution, >> retention, archiving, or copying of this communication is strictly >> prohibited. If you have received this e-mail in error, please notify >> the sender immediately by reply e-mail and delete this message. > > > > > -- > -- > Abbas > Architect > EnterpriseDB Corporation > The Enterprise PostgreSQL Company > > Phone: 92-334-5100153 > > Website: www.enterprisedb.com > EnterpriseDB Blog: http://blogs.enterprisedb.com/ > Follow us on Twitter: http://www.twitter.com/enterprisedb > > This e-mail message (and any attachment) is intended for the use of > the individual or entity to whom it is addressed. This message > contains information from EnterpriseDB Corporation that may be > privileged, confidential, or exempt from disclosure under applicable > law. If you are not the intended recipient or authorized to receive > this for the intended recipient, any use, dissemination, distribution, > retention, archiving, or copying of this communication is strictly > prohibited. If you have received this e-mail in error, please notify > the sender immediately by reply e-mail and delete this message. -- -- Abbas Architect EnterpriseDB Corporation The Enterprise PostgreSQL Company Phone: 92-334-5100153 Website: www.enterprisedb.com EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Kaiji C. <ch...@im...> - 2013-03-29 10:35:49
|
Hi, I'm working on a data partitioning project on PostgreSQL by adding a middleware between the database cluster interface and applications that modify the SQL statement to specific data nodes. I just find that PostgresXC has a nice GTM that can help me do the distributed transaction management works, I considered to transfer my project on it. It seems the sliders (http://wiki.postgresql.org/images/f/f6/PGXC_Scalability_PGOpen2012.pdf) intend that user defined table distribution is not available, but the coordinator can choose specific data node when processing the queries, and the table will be distributed to by default if DISTRIBUTED BY is not specified. Then I wonder if I can specify a data node in each query and stop the default auto distributing process. |
From: Abbas B. <abb...@en...> - 2013-03-27 12:06:03
|
Feature ID 3608375 On Tue, Mar 5, 2013 at 1:45 PM, Abbas Butt <abb...@en...>wrote: > The attached patch changes the name of the option to --include-nodes. > > > On Mon, Mar 4, 2013 at 2:41 PM, Abbas Butt <abb...@en...>wrote: > >> >> >> On Mon, Mar 4, 2013 at 2:09 PM, Ashutosh Bapat < >> ash...@en...> wrote: >> >>> >>> >>> On Mon, Mar 4, 2013 at 1:51 PM, Abbas Butt <abb...@en...>wrote: >>> >>>> What I had in mind was to have pg_dump, when run with include-node, >>>> emit CREATE NODE/ CREATE NODE GROUP commands only and nothing else. Those >>>> commands will be used to create existing nodes/groups on the new >>>> coordinator to be added. So it does make sense to use this option >>>> independently, in fact it is supposed to be used independently. >>>> >>>> >>> Ok, got it. But then include-node is really a misnomer. We should use >>> --dump-nodes or something like that. >>> >> >> In that case we can use include-nodes here. >> >> >>> >>> >>>> >>>> On Mon, Mar 4, 2013 at 11:21 AM, Ashutosh Bapat < >>>> ash...@en...> wrote: >>>> >>>>> Dumping TO NODE clause only makes sense if we dump CREATE NODE/ CREATE >>>>> NODE GROUP. Dumping CREATE NODE/CREATE NODE GROUP may make sense >>>>> independently, but might be useless without dumping TO NODE clause. >>>>> >>>>> BTW, OTOH, dumping CREATE NODE/CREATE NODE GROUP clause wouldn't >>>>> create the nodes on all the coordinators, >>>> >>>> >>>> All the coordinators already have the nodes information. >>>> >>>> >>>>> but only the coordinator where dump will be restored. That's another >>>>> thing you will need to consider OR are you going to fix that as well? >>>> >>>> >>>> As a first step I am only listing the manual steps required to add a >>>> new node, that might say run this command on all the existing coordinators >>>> by connecting to them one by one manually. We can decide to automate these >>>> steps later. >>>> >>>> >>> ok >>> >>> >>> >>>> >>>> >>>>> >>>>> >>>>> On Mon, Mar 4, 2013 at 11:41 AM, Abbas Butt < >>>>> abb...@en...> wrote: >>>>> >>>>>> I was thinking of using include-nodes to dump CREATE NODE / CREATE >>>>>> NODE GROUP, that is required as one of the missing links in adding a new >>>>>> node. How do you think about that? >>>>>> >>>>>> >>>>>> On Mon, Mar 4, 2013 at 9:02 AM, Ashutosh Bapat < >>>>>> ash...@en...> wrote: >>>>>> >>>>>>> Hi Abbas, >>>>>>> Please take a look at >>>>>>> http://www.postgresql.org/docs/9.2/static/app-pgdump.html, which >>>>>>> gives all the command line options for pg_dump. instead of >>>>>>> include-to-node-clause, just include-nodes would suffice, I guess. >>>>>>> >>>>>>> >>>>>>> On Fri, Mar 1, 2013 at 8:36 PM, Abbas Butt < >>>>>>> abb...@en...> wrote: >>>>>>> >>>>>>>> PFA a updated patch that provides a command line argument called >>>>>>>> --include-to-node-clause to let pg_dump know that the created dump is >>>>>>>> supposed to emit TO NODE clause in the CREATE TABLE command. >>>>>>>> If the argument is provided while taking the dump from a datanode, >>>>>>>> it does not show TO NODE clause in the dump since the catalog table is >>>>>>>> empty in this case. >>>>>>>> The documentation of pg_dump is updated accordingly. >>>>>>>> The rest of the functionality stays the same as before. >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Feb 25, 2013 at 10:29 AM, Ashutosh Bapat < >>>>>>>> ash...@en...> wrote: >>>>>>>> >>>>>>>>> I think we should always dump DISTRIBUTE BY. >>>>>>>>> >>>>>>>>> PG does not stop dumping (or provide an option to do so) newer >>>>>>>>> syntax so that the dump will work on older versions. On similar lines, an >>>>>>>>> XC dump can not be used against PG without modification (removing >>>>>>>>> DISTRIBUTE BY). There can be more serious problems like exceeding table >>>>>>>>> size limits if an XC dump is tried to be restored in PG. >>>>>>>>> >>>>>>>>> As to TO NODE clause, I agree, that one can restore the dump on a >>>>>>>>> cluster with different configuration, so giving an option to dump TO NODE >>>>>>>>> clause will help. >>>>>>>>> >>>>>>>>> On Mon, Feb 25, 2013 at 6:42 AM, Michael Paquier < >>>>>>>>> mic...@gm...> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, Feb 25, 2013 at 4:17 AM, Abbas Butt < >>>>>>>>>> abb...@en...> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Sun, Feb 24, 2013 at 5:33 PM, Michael Paquier < >>>>>>>>>>> mic...@gm...> wrote: >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Sun, Feb 24, 2013 at 7:04 PM, Abbas Butt < >>>>>>>>>>>> abb...@en...> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Sun, Feb 24, 2013 at 1:44 PM, Michael Paquier < >>>>>>>>>>>>> mic...@gm...> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sun, Feb 24, 2013 at 3:51 PM, Abbas Butt < >>>>>>>>>>>>>> abb...@en...> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>> PFA a patch to fix pg_dump to generate TO NODE clause in >>>>>>>>>>>>>>> the dump. >>>>>>>>>>>>>>> This is required because otherwise all tables get created on >>>>>>>>>>>>>>> all nodes after a dump-restore cycle. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> Not sure this is good if you take a dump of an XC cluster to >>>>>>>>>>>>>> restore that to a vanilla Postgres cluster. >>>>>>>>>>>>>> Why not adding a new option that would control the generation >>>>>>>>>>>>>> of this clause instead of forcing it? >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I think you can use the pg_dump that comes with vanilla PG to >>>>>>>>>>>>> do that, can't you? But I am open to adding a control option if every body >>>>>>>>>>>>> thinks so. >>>>>>>>>>>>> >>>>>>>>>>>> Sure you can, this is just to simplify the life of users a >>>>>>>>>>>> maximum by not having multiple pg_dump binaries in their serves. >>>>>>>>>>>> Saying that, I think that there is no option to choose if >>>>>>>>>>>> DISTRIBUTE BY is printed in the dump or not... >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Yah if we choose to have an option we will put both DISTRIBUTE >>>>>>>>>>> BY and TO NODE under it. >>>>>>>>>>> >>>>>>>>>> Why not an option for DISTRIBUTE BY, and another for TO NODE? >>>>>>>>>> This would bring more flexibility to the way dumps are generated. >>>>>>>>>> -- >>>>>>>>>> Michael >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ------------------------------------------------------------------------------ >>>>>>>>>> Everyone hates slow websites. So do we. >>>>>>>>>> Make your web apps faster with AppDynamics >>>>>>>>>> Download AppDynamics Lite for free today: >>>>>>>>>> http://p.sf.net/sfu/appdyn_d2d_feb >>>>>>>>>> _______________________________________________ >>>>>>>>>> Postgres-xc-developers mailing list >>>>>>>>>> Pos...@li... >>>>>>>>>> >>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Best Wishes, >>>>>>>>> Ashutosh Bapat >>>>>>>>> EntepriseDB Corporation >>>>>>>>> The Enterprise Postgres Company >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> -- >>>>>>>> Abbas >>>>>>>> Architect >>>>>>>> EnterpriseDB Corporation >>>>>>>> The Enterprise PostgreSQL Company >>>>>>>> >>>>>>>> Phone: 92-334-5100153 >>>>>>>> >>>>>>>> Website: www.enterprisedb.com >>>>>>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>>>>>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>>>>>>> >>>>>>>> This e-mail message (and any attachment) is intended for the use of >>>>>>>> the individual or entity to whom it is addressed. This message >>>>>>>> contains information from EnterpriseDB Corporation that may be >>>>>>>> privileged, confidential, or exempt from disclosure under applicable >>>>>>>> law. If you are not the intended recipient or authorized to receive >>>>>>>> this for the intended recipient, any use, dissemination, >>>>>>>> distribution, >>>>>>>> retention, archiving, or copying of this communication is strictly >>>>>>>> prohibited. If you have received this e-mail in error, please notify >>>>>>>> the sender immediately by reply e-mail and delete this message. >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Best Wishes, >>>>>>> Ashutosh Bapat >>>>>>> EntepriseDB Corporation >>>>>>> The Enterprise Postgres Company >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> -- >>>>>> Abbas >>>>>> Architect >>>>>> EnterpriseDB Corporation >>>>>> The Enterprise PostgreSQL Company >>>>>> >>>>>> Phone: 92-334-5100153 >>>>>> >>>>>> Website: www.enterprisedb.com >>>>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>>>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>>>>> >>>>>> This e-mail message (and any attachment) is intended for the use of >>>>>> the individual or entity to whom it is addressed. This message >>>>>> contains information from EnterpriseDB Corporation that may be >>>>>> privileged, confidential, or exempt from disclosure under applicable >>>>>> law. If you are not the intended recipient or authorized to receive >>>>>> this for the intended recipient, any use, dissemination, distribution, >>>>>> retention, archiving, or copying of this communication is strictly >>>>>> prohibited. If you have received this e-mail in error, please notify >>>>>> the sender immediately by reply e-mail and delete this message. >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Best Wishes, >>>>> Ashutosh Bapat >>>>> EntepriseDB Corporation >>>>> The Enterprise Postgres Company >>>>> >>>> >>>> >>>> >>>> -- >>>> -- >>>> Abbas >>>> Architect >>>> EnterpriseDB Corporation >>>> The Enterprise PostgreSQL Company >>>> >>>> Phone: 92-334-5100153 >>>> >>>> Website: www.enterprisedb.com >>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>>> >>>> This e-mail message (and any attachment) is intended for the use of >>>> the individual or entity to whom it is addressed. This message >>>> contains information from EnterpriseDB Corporation that may be >>>> privileged, confidential, or exempt from disclosure under applicable >>>> law. If you are not the intended recipient or authorized to receive >>>> this for the intended recipient, any use, dissemination, distribution, >>>> retention, archiving, or copying of this communication is strictly >>>> prohibited. If you have received this e-mail in error, please notify >>>> the sender immediately by reply e-mail and delete this message. >>>> >>> >>> >>> >>> -- >>> Best Wishes, >>> Ashutosh Bapat >>> EntepriseDB Corporation >>> The Enterprise Postgres Company >>> >> >> >> >> -- >> -- >> Abbas >> Architect >> EnterpriseDB Corporation >> The Enterprise PostgreSQL Company >> >> Phone: 92-334-5100153 >> >> Website: www.enterprisedb.com >> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> Follow us on Twitter: http://www.twitter.com/enterprisedb >> >> This e-mail message (and any attachment) is intended for the use of >> the individual or entity to whom it is addressed. This message >> contains information from EnterpriseDB Corporation that may be >> privileged, confidential, or exempt from disclosure under applicable >> law. If you are not the intended recipient or authorized to receive >> this for the intended recipient, any use, dissemination, distribution, >> retention, archiving, or copying of this communication is strictly >> prohibited. If you have received this e-mail in error, please notify >> the sender immediately by reply e-mail and delete this message. >> > > > > -- > -- > Abbas > Architect > EnterpriseDB Corporation > The Enterprise PostgreSQL Company > > Phone: 92-334-5100153 > > Website: www.enterprisedb.com > EnterpriseDB Blog: http://blogs.enterprisedb.com/ > Follow us on Twitter: http://www.twitter.com/enterprisedb > > This e-mail message (and any attachment) is intended for the use of > the individual or entity to whom it is addressed. This message > contains information from EnterpriseDB Corporation that may be > privileged, confidential, or exempt from disclosure under applicable > law. If you are not the intended recipient or authorized to receive > this for the intended recipient, any use, dissemination, distribution, > retention, archiving, or copying of this communication is strictly > prohibited. If you have received this e-mail in error, please notify > the sender immediately by reply e-mail and delete this message. > -- -- Abbas Architect EnterpriseDB Corporation The Enterprise PostgreSQL Company Phone: 92-334-5100153 Website: www.enterprisedb.com EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Abbas B. <abb...@en...> - 2013-03-27 12:05:03
|
Feature ID 3608376 On Sun, Mar 10, 2013 at 7:59 PM, Abbas Butt <abb...@en...>wrote: > Hi, > Attached please find a patch that adds support in pg_dump to dump nodes > and node groups. This is required while adding a new node to the cluster. > > -- > Abbas > Architect > EnterpriseDB Corporation > The Enterprise PostgreSQL Company > > Phone: 92-334-5100153 > > Website: www.enterprisedb.com > EnterpriseDB Blog: http://blogs.enterprisedb.com/ > Follow us on Twitter: http://www.twitter.com/enterprisedb > > This e-mail message (and any attachment) is intended for the use of > the individual or entity to whom it is addressed. This message > contains information from EnterpriseDB Corporation that may be > privileged, confidential, or exempt from disclosure under applicable > law. If you are not the intended recipient or authorized to receive > this for the intended recipient, any use, dissemination, distribution, > retention, archiving, or copying of this communication is strictly > prohibited. If you have received this e-mail in error, please notify > the sender immediately by reply e-mail and delete this message. -- -- Abbas Architect EnterpriseDB Corporation The Enterprise PostgreSQL Company Phone: 92-334-5100153 Website: www.enterprisedb.com EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Abbas B. <abb...@en...> - 2013-03-27 12:02:46
|
Feature ID 3608379 On Fri, Mar 1, 2013 at 5:48 PM, Amit Khandekar < ami...@en...> wrote: > On 1 March 2013 01:30, Abbas Butt <abb...@en...> wrote: > > > > > > On Thu, Feb 28, 2013 at 12:44 PM, Amit Khandekar > > <ami...@en...> wrote: > >> > >> > >> > >> On 28 February 2013 10:23, Abbas Butt <abb...@en...> > wrote: > >>> > >>> Hi All, > >>> > >>> Attached please find a patch that provides a new command line argument > >>> for postgres called --restoremode. > >>> > >>> While adding a new node to the cluster we need to restore the schema of > >>> existing database to the new node. > >>> If the new node is a datanode and we connect directly to it, it does > not > >>> allow DDL, because it is in read only mode & > >>> If the new node is a coordinator, it will send DDLs to all the other > >>> coordinators which we do not want it to do. > >> > >> > >> What if we allow writes in standalone mode, so that we would initialize > >> the new node using standalone mode instead of --restoremode ? > > > > > > Please take a look at the patch, I am using --restoremode in place of > > --coordinator & --datanode. I am not sure how would stand alone mode fit > in > > here. > > I was trying to see if we can avoid adding a new mode, instead, use > standalone mode for all the purposes for which restoremode is used. > Actually I checked the documentation, it says this mode is used only > for debugging or recovery purposes, so now I myself am a bit hesitent > about this mode for the purpose of restoring. > > > > >> > >> > >>> > >>> To provide ability to restore on the new node a new command line > argument > >>> is provided. > >>> It is to be provided in place of --coordinator OR --datanode. > >>> In restore mode both coordinator and datanode are internally treated > as a > >>> datanode. > >>> For more details see patch comments. > >>> > >>> After this patch one can add a new node to the cluster. > >>> > >>> Here are the steps to add a new coordinator > >>> > >>> > >>> 1) Initdb new coordinator > >>> /usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data_cord3 > >>> --nodename coord_3 > >>> > >>> 2) Make necessary changes in its postgresql.conf, in particular > specify > >>> new coordinator name and pooler port > >>> > >>> 3) Connect to any of the existing coordinators & lock the cluster for > >>> backup > >>> ./psql postgres -p 5432 > >>> SET xc_lock_for_backup=yes; > >>> \q > >> > >> > >> I haven't given a thought on the earlier patch you sent for cluster lock > >> implementation; may be we can discuss this on that thread, but just a > quick > >> question: > >> > >> Does the cluster-lock command wait for the ongoing DDL commands to > finish > >> ? If not, we have problems. The subsequent pg_dump would not contain > objects > >> created by these particular DDLs. > > > > > > Suppose you have a two coordinator cluster. Assume one client connected > to > > each. Suppose one client issues a lock cluster command and the other > issues > > a DDL. Is this what you mean by an ongoing DDL? If true then answer to > your > > question is Yes. > > > > Suppose you have a prepared transaction that has a DDL in it, again if > this > > can be considered an on going DDL, then again answer to your question is > > Yes. > > > > Suppose you have a two coordinator cluster. Assume one client connected > to > > each. One client starts a transaction and issues a DDL, the second client > > issues a lock cluster command, the first commits the transaction. If > this is > > an ongoing DDL, then the answer to your question is No. But its a matter > of > > deciding which camp are we going to put COMMIT in, the allow camp, or the > > deny camp. I decided to put it in allow camp, because I have not yet > written > > any code to detect whether a transaction being committed has a DDL in it > or > > not, and stopping all transactions from committing looks too restrictive > to > > me. > > > > Do you have some other meaning of an ongoing DDL? > > > > I agree that we should have discussed this on the right thread. Lets > > continue this discussion on that thread. > > Continued on the other thread. > > > > >> > >> > >>> > >>> > >>> 4) Connect to any of the existing coordinators and take backup of the > >>> database > >>> ./pg_dump -p 5432 -C -s > >>> --file=/home/edb/Desktop/NodeAddition/dumps/101_all_objects_coord.sql > test > >>> > >>> 5) Start the new coordinator specify --restoremode while starting the > >>> coordinator > >>> ./postgres --restoremode -D ../data_cord3 -p 5455 > >>> > >>> 6) connect to the new coordinator directly > >>> ./psql postgres -p 5455 > >>> > >>> 7) create all the datanodes and the rest of the coordinators on the > new > >>> coordiantor & reload configuration > >>> CREATE NODE DATA_NODE_1 WITH (HOST = 'localhost', type = > >>> 'datanode', PORT = 15432, PRIMARY); > >>> CREATE NODE DATA_NODE_2 WITH (HOST = 'localhost', type = > >>> 'datanode', PORT = 25432); > >>> > >>> CREATE NODE COORD_1 WITH (HOST = 'localhost', type = > >>> 'coordinator', PORT = 5432); > >>> CREATE NODE COORD_2 WITH (HOST = 'localhost', type = > >>> 'coordinator', PORT = 5433); > >>> > >>> SELECT pgxc_pool_reload(); > >>> > >>> 8) quit psql > >>> > >>> 9) Create the new database on the new coordinator > >>> ./createdb test -p 5455 > >>> > >>> 10) create the roles and table spaces manually, the dump does not > contain > >>> roles or table spaces > >>> ./psql test -p 5455 > >>> CREATE ROLE admin WITH LOGIN CREATEDB CREATEROLE; > >>> CREATE TABLESPACE my_space LOCATION > >>> '/usr/local/pgsql/my_space_location'; > >>> \q > >>> > >> > >> Will pg_dumpall help ? It dumps roles also. > > > > > > Yah , but I am giving example of pg_dump so this step has to be there. > > > >> > >> > >> > >>> > >>> 11) Restore the backup that was taken from an existing coordinator by > >>> connecting to the new coordinator directly > >>> ./psql -d test -f > >>> /home/edb/Desktop/NodeAddition/dumps/101_all_objects_coord.sql -p 5455 > >>> > >>> 11) Quit the new coordinator > >>> > >>> 12) Connect to any of the existing coordinators & unlock the cluster > >>> ./psql postgres -p 5432 > >>> SET xc_lock_for_backup=no; > >>> \q > >>> > >> > >> Unlocking the cluster has to be done *after* the node is added into the > >> cluster. > > > > > > Very true. I stand corrected. This means CREATE NODE has to be allowed > when > > xc_lock_for_backup is set. > > > >> > >> > >> > >>> > >>> 13) Start the new coordinator as a by specifying --coordinator > >>> ./postgres --coordinator -D ../data_cord3 -p 5455 > >>> > >>> 14) Create the new coordinator on rest of the coordinators and reload > >>> configuration > >>> CREATE NODE COORD_3 WITH (HOST = 'localhost', type = > >>> 'coordinator', PORT = 5455); > >>> SELECT pgxc_pool_reload(); > >>> > >>> 15) The new coordinator is now ready > >>> ./psql test -p 5455 > >>> create table test_new_coord(a int, b int); > >>> \q > >>> ./psql test -p 5432 > >>> select * from test_new_coord; > >>> > >>> > >>> Here are the steps to add a new datanode > >>> > >>> > >>> 1) Initdb new datanode > >>> /usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data3 > --nodename > >>> data_node_3 > >>> > >>> 2) Make necessary changes in its postgresql.conf, in particular > specify > >>> new datanode name > >>> > >>> 3) Connect to any of the existing coordinators & lock the cluster for > >>> backup > >>> ./psql postgres -p 5432 > >>> SET xc_lock_for_backup=yes; > >>> \q > >>> > >>> 4) Connect to any of the existing datanodes and take backup of the > >>> database > >>> ./pg_dump -p 15432 -C -s > >>> --file=/home/edb/Desktop/NodeAddition/dumps/102_all_objects_dn1.sql > test > >>> > >>> 5) Start the new datanode specify --restoremode while starting the it > >>> ./postgres --restoremode -D ../data3 -p 35432 > >>> > >>> 6) Create the new database on the new datanode > >>> ./createdb test -p 35432 > >>> > >>> 7) create the roles and table spaces manually, the dump does not > contain > >>> roles or table spaces > >>> ./psql test -p 35432 > >>> CREATE ROLE admin WITH LOGIN CREATEDB CREATEROLE; > >>> CREATE TABLESPACE my_space LOCATION > >>> '/usr/local/pgsql/my_space_location'; > >>> \q > >>> > >>> 8) Restore the backup that was taken from an existing datanode by > >>> connecting to the new datanode directly > >>> ./psql -d test -f > >>> /home/edb/Desktop/NodeAddition/dumps/102_all_objects_dn1.sql -p 35432 > >>> > >>> 9) Quit the new datanode > >>> > >>> 10) Connect to any of the existing coordinators & unlock the cluster > >>> ./psql postgres -p 5432 > >>> SET xc_lock_for_backup=no; > >>> \q > >>> > >>> 11) Start the new datanode as a datanode by specifying --datanode > >>> ./postgres --datanode -D ../data3 -p 35432 > >>> > >>> 12) Create the new datanode on all the coordinators and reload > >>> configuration > >>> CREATE NODE DATA_NODE_3 WITH (HOST = 'localhost', type = > >>> 'datanode', PORT = 35432); > >>> SELECT pgxc_pool_reload(); > >>> > >>> 13) Redistribute data by using ALTER TABLE REDISTRIBUTE > >>> > >>> 14) The new daatnode is now ready > >>> ./psql test > >>> create table test_new_dn(a int, b int) distribute by > replication; > >>> insert into test_new_dn values(1,2); > >>> EXECUTE DIRECT ON (data_node_1) 'SELECT * from test_new_dn'; > >>> EXECUTE DIRECT ON (data_node_2) 'SELECT * from test_new_dn'; > >>> EXECUTE DIRECT ON (data_node_3) 'SELECT * from test_new_dn'; > >>> > >>> Please note that the steps assume that the patch sent earlier > >>> 1_lock_cluster.patch in mail subject [Patch to lock cluster] is > applied. > >>> > >>> I have also attached test database scripts, that would help in patch > >>> review. > >>> > >>> Comments are welcome. > >>> > >>> -- > >>> Abbas > >>> Architect > >>> EnterpriseDB Corporation > >>> The Enterprise PostgreSQL Company > >>> > >>> Phone: 92-334-5100153 > >>> > >>> Website: www.enterprisedb.com > >>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ > >>> Follow us on Twitter: http://www.twitter.com/enterprisedb > >>> > >>> This e-mail message (and any attachment) is intended for the use of > >>> the individual or entity to whom it is addressed. This message > >>> contains information from EnterpriseDB Corporation that may be > >>> privileged, confidential, or exempt from disclosure under applicable > >>> law. If you are not the intended recipient or authorized to receive > >>> this for the intended recipient, any use, dissemination, distribution, > >>> retention, archiving, or copying of this communication is strictly > >>> prohibited. If you have received this e-mail in error, please notify > >>> the sender immediately by reply e-mail and delete this message. > >>> > >>> > ------------------------------------------------------------------------------ > >>> Everyone hates slow websites. So do we. > >>> Make your web apps faster with AppDynamics > >>> Download AppDynamics Lite for free today: > >>> http://p.sf.net/sfu/appdyn_d2d_feb > >>> _______________________________________________ > >>> Postgres-xc-developers mailing list > >>> Pos...@li... > >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > >>> > >> > > > > > > > > -- > > -- > > Abbas > > Architect > > EnterpriseDB Corporation > > The Enterprise PostgreSQL Company > > > > Phone: 92-334-5100153 > > > > Website: www.enterprisedb.com > > EnterpriseDB Blog: http://blogs.enterprisedb.com/ > > Follow us on Twitter: http://www.twitter.com/enterprisedb > > > > This e-mail message (and any attachment) is intended for the use of > > the individual or entity to whom it is addressed. This message > > contains information from EnterpriseDB Corporation that may be > > privileged, confidential, or exempt from disclosure under applicable > > law. If you are not the intended recipient or authorized to receive > > this for the intended recipient, any use, dissemination, distribution, > > retention, archiving, or copying of this communication is strictly > > prohibited. If you have received this e-mail in error, please notify > > the sender immediately by reply e-mail and delete this message. > -- -- Abbas Architect EnterpriseDB Corporation The Enterprise PostgreSQL Company Phone: 92-334-5100153 Website: www.enterprisedb.com EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Abbas B. <abb...@en...> - 2013-03-27 11:55:19
|
Bug ID 3608374 On Fri, Mar 8, 2013 at 12:25 PM, Abbas Butt <abb...@en...>wrote: > Attached please find revised patch that provides the following in addition > to what it did earlier. > > 1. Uses GetPreferredReplicationNode() instead of list_truncate() > 2. Adds test cases to xc_alter_table and xc_copy. > > I tested the following in reasonable detail to find whether any other > caller of GetRelationNodes() needs some fixing or not and found that none > of the other callers needs any more fixing. > I tested > a) copy > b) alter table redistribute > c) utilities > d) dmls etc > > However while testing ALTER TABLE, I found that replicated to hash is not > working correctly. > > This test case fails, since only SIX rows are expected in the final result. > > test=# create table t_r_n12(a int, b int) distribute by replication to > node (DATA_NODE_1, DATA_NODE_2); > CREATE TABLE > test=# insert into t_r_n12 values(1,777),(3,4),(5,6),(20,30),(NULL,999), > (NULL, 999); > INSERT 0 6 > test=# -- rep to hash > test=# ALTER TABLE t_r_n12 distribute by hash(a); > ALTER TABLE > test=# SELECT * FROM t_r_n12 order by 1; > a | b > ----+----- > 1 | 777 > 3 | 4 > 5 | 6 > 20 | 30 > | 999 > | 999 > | 999 > | 999 > (8 rows) > > test=# drop table t_r_n12; > DROP TABLE > > I have added a source forge bug tracker id to this case (Artifact 3607290<https://sourceforge.net/tracker/?func=detail&aid=3607290&group_id=311227&atid=1310232>). > The reason for this error is that the function distrib_delete_hash does not > take into account that the distribution column can be null. I will provide > a separate fix for that one. > Regression shows no extra failure except that test case xc_alter_table > would fail until 3607290 is fixed. > > Regards > > > > On Mon, Feb 25, 2013 at 10:18 AM, Ashutosh Bapat < > ash...@en...> wrote: > >> Thanks a lot Abbas for this quick fix. >> >> I am sorry, it's caused by my refactoring of GetRelationNodes(). >> >> If possible, can you please examine the other callers of >> GetRelationNodes() which would face the problems, esp. the ones for DML and >> utilities. This is other instance, where deciding the nodes to execute on >> at the time of execution will help. >> >> About the fix >> Can you please use GetPreferredReplicationNode() instead of >> list_truncate()? It will pick the preferred node instead of first one. If >> you find more places where we need this fix, it might be better to create a >> wrapper function and use it at those places. >> >> On Sat, Feb 23, 2013 at 2:59 PM, Abbas Butt <abb...@en...>wrote: >> >>> Hi, >>> PFA a patch to fix a crash when COPY TO is used on a replicated table. >>> >>> This test case produces a crash >>> >>> create table tab_rep(a int, b int) distribute by replication; >>> insert into tab_rep values(1,2), (3,4), (5,6), (7,8); >>> COPY tab_rep (a, b) TO stdout; >>> >>> Here is a description of the problem and the fix >>> In case of a read from a replicated table GetRelationNodes() >>> returns all nodes and expects that the planner can choose >>> one depending on the rest of the join tree. >>> In case of COPY TO we should choose the first one in the node list >>> This fixes a system crash and makes pg_dump work fine. >>> >>> -- >>> Abbas >>> Architect >>> EnterpriseDB Corporation >>> The Enterprise PostgreSQL Company >>> >>> Phone: 92-334-5100153 >>> >>> Website: www.enterprisedb.com >>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>> >>> This e-mail message (and any attachment) is intended for the use of >>> the individual or entity to whom it is addressed. This message >>> contains information from EnterpriseDB Corporation that may be >>> privileged, confidential, or exempt from disclosure under applicable >>> law. If you are not the intended recipient or authorized to receive >>> this for the intended recipient, any use, dissemination, distribution, >>> retention, archiving, or copying of this communication is strictly >>> prohibited. If you have received this e-mail in error, please notify >>> the sender immediately by reply e-mail and delete this message. >>> >>> ------------------------------------------------------------------------------ >>> Everyone hates slow websites. So do we. >>> Make your web apps faster with AppDynamics >>> Download AppDynamics Lite for free today: >>> http://p.sf.net/sfu/appdyn_d2d_feb >>> _______________________________________________ >>> Postgres-xc-developers mailing list >>> Pos...@li... >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>> >>> >> >> >> -- >> Best Wishes, >> Ashutosh Bapat >> EntepriseDB Corporation >> The Enterprise Postgres Company >> > > > > -- > -- > Abbas > Architect > EnterpriseDB Corporation > The Enterprise PostgreSQL Company > > Phone: 92-334-5100153 > > Website: www.enterprisedb.com > EnterpriseDB Blog: http://blogs.enterprisedb.com/ > Follow us on Twitter: http://www.twitter.com/enterprisedb > > This e-mail message (and any attachment) is intended for the use of > the individual or entity to whom it is addressed. This message > contains information from EnterpriseDB Corporation that may be > privileged, confidential, or exempt from disclosure under applicable > law. If you are not the intended recipient or authorized to receive > this for the intended recipient, any use, dissemination, distribution, > retention, archiving, or copying of this communication is strictly > prohibited. If you have received this e-mail in error, please notify > the sender immediately by reply e-mail and delete this message. > -- -- Abbas Architect EnterpriseDB Corporation The Enterprise PostgreSQL Company Phone: 92-334-5100153 Website: www.enterprisedb.com EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Ashutosh B. <ash...@en...> - 2013-03-27 06:51:39
|
Another problem we will encounter is, what if the memory is not enough to merge runs from all the nodes. We are already seeing 20 node configurations, and that would grow, I guess. In such situations we need to start with these as initial runs input to "polyphase sorting" algorithm by Knuth. On Mon, Mar 25, 2013 at 4:43 PM, Ashutosh Bapat < ash...@en...> wrote: > Hi All, > I am working on using remote sorting for merge joins. The idea is while > using merge join at the coordinator, get the data sorted from the > datanodes; for replicated relations, we can get all the rows sorted and for > distributed tables we have to get sorted runs which can be merged at the > coordinator. For merge join the sorted inner relation needs to be randomly > accessible. For replicated relations this can be achieved by materialising > the result. But for distributed relations, we do not materialise the sorted > result at coordinator but compute the sorted result by merging the sorted > results from individual nodes on the fly. For distributed relations, the > connection to the datanodes themselves are used as logical tapes (which > provide the sorted runs). The final result is computed on the fly by > choosing the smallest or greatest row (as required) from the connections. > > For a Sort node the materialised result can reside in memory (if it fits > there) or on one of the logical tapes used for merge sort. So, in order to > provide random access to the sorted result, we need to materialise the > result either in the memory or on the logical tape. In-memory > materialisation is not easily possible since we have already resorted for > tape based sort, in case of distributed relations and to materialise the > result on tape, there is no logical tape available in current algorithm. To > make it work, there are following possible ways > > 1. When random access is required, materialise the sorted runs from > individual nodes onto tapes (one tape for each node) and then merge them on > one extra tape, which can be used for materialisation. > 2. Use a mix of connections and logical tape in the same tape set. Merge > the sorted runs from connections on a logical tape in the same logical tape > set. > > While the second one looks attractive from performance perspective (it > saves writing and reading from the tape), it would make the merge code ugly > by using mixed tapes. The read calls for connection and logical tape are > different and we will need both on the logical tape where the final result > is materialized. So, I am thinking of going with 1, in fact, to have same > code to handle remote sort, use 1 in all cases (whether or not > materialization is required). > > Had original authors of remote sort code thought about this > materialization? Anything they can share on this topic? > Any comment? > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Ashutosh B. <ash...@en...> - 2013-03-26 10:23:23
|
On Tue, Mar 26, 2013 at 8:56 AM, Amit Khandekar < ami...@en...> wrote: > > > On 4 March 2013 11:11, Amit Khandekar <ami...@en...>wrote: > >> On 1 March 2013 13:53, Nikhil Sontakke <ni...@st...> wrote: >> >> >> >> Issue: Whether we should fetch the whole from the datanode (OLD row) >> and not >> >> just ctid and node_id and required columns and store it at the >> coordinator >> >> for the processing OR whether we should fetch each row (OLD and NEW >> >> variants) while processing each row. >> >> >> >> Both of them have performance impacts - the first one has disk impact >> for >> >> large number of rows whereas the second has network impact for querying >> >> rows. Is it possible to do some analytical assessment as to which of >> them >> >> would be better? If you can come up with something concrete (may be >> numbers >> >> or formulae) we will be able to judge better as to which one to pick >> up. >> >> Will check if we can come up with some sensible analysis or figures. >> >> > I have done some analysis on both of these approaches here: > > https://docs.google.com/document/d/10QPPq_go_wHqKqhmOFXjJAokfdLR8OaUyZVNDu47GWk/edit?usp=sharing > > In practical terms, we anyways would need to implement (B). The reason is > because when the trigger has conditional execution(WHEN clause) we *have* > to fetch the rows beforehand, so there is no point in fetching all of them > again at the end of the statement when we already have them locally. So may > be it would be too ambitious to have have both implementations, at least > for this release. > > I agree here. We can certainly optimize for various cases later, but we should have something which would give all the functionality (albeit at a lower performance for now). > So I am focussing on (B) right now. We have two options: > > 1. Store all rows in palloced memory, and save the HeapTuple pointers in > the trigger queue, and directly access the OLD and NEW rows using these > pointers when needed. Here we will have no control over how much memory we > should use for the old and new records, and this might even hamper system > performance, let alone XC performance. > 2. Other option is to use tuplestore. Here, we need to store the positions > of the records in the tuplestore. So for a particular tigger event, fetch > by the position. From what I understand, tuplestore can be advanced only > sequentially in either direction. So when the read pointer is at position 6 > and we need to fetch a record at position 10, we need to call > tuplestore_advance() 4 times, and this call involves palloc/pfree overhead > because it calls tuplestore_gettuple(). But the trigger records are not > distributed so randomly. In fact a set of trigger events for a particular > event id are accessed in the same order as the order in which they are > queued. So for a particular event id, only the first access call will > require random access. tuplestore supports multiple read pointers, so may > be we can make use of that to access the first record using the closest > read pointer. > > Using palloc will be a problem if the size of data fetched is more that what could fit in memory. Also pallocing frequently is going to be performance problem. Let's see how does the tuple store approach go. > > >> >> >> > >> > Or we can consider a hybrid approach of getting the rows in batches of >> > 1000 or so if possible as well. That ways they get into coordinator >> > memory in one shot and can be processed in batches. Obviously this >> > should be considered if it's not going to be a complicated >> > implementation. >> >> It just occurred to me that it would not be that hard to optimize the >> row-fetching-by-ctid as shown below: >> 1. When it is time to fire the queued triggers at the >> statement/transaction end, initialize cursors - one cursor per >> datanode - which would do: SELECT remote_heap_fetch(table_name, >> '<ctidlist>'); We can form this ctidlist out of the trigger even list. >> 2. For each trigger event entry in the trigger queue, FETCH NEXT using >> the appropriate cursor name according to the datanode id to which the >> trigger entry belongs. >> >> > >> >>> Currently we fetch all attributes in the SELECT subplans. I have >> >>> created another patch to fetch only the required attribtues, but have >> >>> not merged that into this patch. >> > >> > Do we have other places where we unnecessary fetch all attributes? >> > ISTM, this should be fixed as a performance improvement first ahead of >> > everything else. >> >> I believe DML subplan is the only remaining place where we fetch all >> attributes. And yes, this is a must-have for triggers, otherwise, the >> other optimizations would be of no use. >> >> > >> >>> 2. One important TODO for BEFORE trigger is this: Just before >> >>> invoking the trigger functions, in PG, the tuple is row-locked >> >>> (exclusive) by GetTupleTrigger() and the locked version is fetched >> >>> from the table. So it is made sure that while all the triggers for >> >>> that table are executed, no one can update that particular row. >> >>> In the patch, we haven't locked the row. We need to lock it either by >> >>> executing : >> >>> 1. SELECT * from tab1 where ctid = <ctid_val> FOR UPDATE, and then >> >>> use the returned ROW as the OLD row. >> >>> OR >> >>> 2. The UPDATE subplan itself should have SELECT for UPDATE so that >> >>> the row is already locked, and we don't have to lock it again. >> >>> #2 is simple though it might cause some amount of longer waits in >> general. >> >>> Using #1, though the locks would be acquired only when the particular >> >>> row is updated, the locks would be released only after transaction >> >>> end, so #1 might not be worth implementing. >> >>> Also #1 requires another explicit remote fetch for the >> >>> lock-and-get-latest-version operation. >> >>> I am more inclined towards #2. >> >>> >> >> The option #2 however, has problem of locking too many rows if there >> are >> >> coordinator quals in the subplans IOW the number of rows finally >> updated are >> >> lesser than the number of rows fetched from the datanode. It can cause >> >> unwanted deadlocks. Unless there is a way to release these extra >> locks, I am >> >> afraid this option will be a problem. >> >> True. Regardless of anything else - whether it is deadlocks or longer >> waits, we should not lock rows that are not to be updated. >> >> There is a more general row-locking issue that we need to solve first >> : 3606317. I anticipate that solving this will solve the trigger >> specific lock issue. So for triggers, this is a must-have, and I am >> going to solve this issue as part of this bug 3606317. >> >> >> >> > Deadlocks? ISTM, we can get more lock waits because of this but I do >> > not see deadlock scenarios.. >> > >> > With the FQS shipping work being done by Ashutosh, will we also ship >> > major chunks of subplans to the datanodes? If yes, then row locking >> > will only involve required tuples (hopefully) from the coordinator's >> > point of view. >> > >> > Also, something radical is can be invent a new type of FOR [NODE] >> > UPDATE type lock to minimize the impact of such locking of rows on >> > datanodes? >> > >> > Regards, >> > Nikhils >> > >> >>> >> >>> 3. The BEFORE trigger function can change the distribution column >> >>> itself. We need to add a check at the end of the trigger executions. >> >>> >> >> >> >> Good, you thought about that. Yes we should check it. >> >> >> >>> >> >>> 4. Fetching OLD row for WHEN clause handling. >> >>> >> >>> 5. Testing with mix of Shippable and non-shippable ROW triggers >> >>> >> >>> 6. Other types of triggers. INSTEAD triggers are anticipated to work >> >>> without significant changes, but they are yet to be tested. >> >>> INSERT/DELETE triggers: Most of the infrastructure has been done while >> >>> implementing UPDATE triggers. But some changes specific to INSERT and >> >>> DELETE are yet to be done. >> >>> Deferred triggers to be tested. >> >>> >> >>> 7. Regression analysis. There are some new failures. Will post another >> >>> fair version of the patch after regression analysis and fixing various >> >>> TODOs. >> >>> >> >>> Comments welcome. >> >>> >> >>> >> >>> >> ------------------------------------------------------------------------------ >> >>> Everyone hates slow websites. So do we. >> >>> Make your web apps faster with AppDynamics >> >>> Download AppDynamics Lite for free today: >> >>> http://p.sf.net/sfu/appdyn_d2d_feb >> >>> _______________________________________________ >> >>> Postgres-xc-developers mailing list >> >>> Pos...@li... >> >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >>> >> >> >> >> >> >> >> >> -- >> >> Best Wishes, >> >> Ashutosh Bapat >> >> EntepriseDB Corporation >> >> The Enterprise Postgres Company >> >> >> >> >> ------------------------------------------------------------------------------ >> >> Everyone hates slow websites. So do we. >> >> Make your web apps faster with AppDynamics >> >> Download AppDynamics Lite for free today: >> >> http://p.sf.net/sfu/appdyn_d2d_feb >> >> _______________________________________________ >> >> Postgres-xc-developers mailing list >> >> Pos...@li... >> >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >> >> > >> > >> > >> > -- >> > StormDB - http://www.stormdb.com >> > The Database Cloud >> > Postgres-XC Support and Service >> > > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Koichi S. <koi...@gm...> - 2013-03-26 05:41:18
|
Understood the situation. Bulk row transfer between coordinator/datanode is another infrastructure we need for sure. This will fit 10G network (we need to use giant packet to use its bandwidth). Regards; ---------- Koichi Suzuki 2013/3/26 Ashutosh Bapat <ash...@en...>: > > > On Tue, Mar 26, 2013 at 10:19 AM, Koichi Suzuki <koi...@gm...> > wrote: >> >> On thing we should think for option 1 is: >> >> When a number of the result is huge, applications has to wait long >> time until they get the first row. Because this option may need disk >> write, total resource consumption will be larger. >> > > Yes, I am aware of this fact. Please read the next paragraph and you will > see that the current situation is no better. > >> >> I'm wondering if we can use "cursor" at database so that we can read >> each tape more simply, I mean, to leave each query node open and read >> next row from any query node. >> > > We do that right now. But because of such a simulated cursor (it's not > cursor per say, but we just fetch the required result from connection as the > demand arises in merging runs), we observer following things > > If the plan has multiple remote query nodes (as there will be in case of > merge join), we assign the same connection to these nodes. Before this > assignment, the result from the previous connection is materialised at the > coordinator. This means that, when we will get huge result from the > datanode, it will be materialised (which will have the more cost as > materialising it on tape, as this materialisation happens in a linked list, > which is not optimized). We need to share connection between more than one > RemoteQuery node because same transaction can not work on two connections to > same server. Not only performance, but the code has become ugly because of > this approach. At various places in executor, we have special handling for > sorting, which needs to be maintained. > > Instead if we materialise all the result on tape and then proceed with step > D5 in Knuth's algorithm for polyphase merge sort, the code will be much > simpler and we won't loose much performance. In fact, we might be able to > leverage fetching bulk data on connection which can be materialised on tape > in bulk. > >> >> Regards; >> ---------- >> Koichi Suzuki >> >> >> 2013/3/25 Ashutosh Bapat <ash...@en...>: >> > Hi All, >> > I am working on using remote sorting for merge joins. The idea is while >> > using merge join at the coordinator, get the data sorted from the >> > datanodes; >> > for replicated relations, we can get all the rows sorted and for >> > distributed >> > tables we have to get sorted runs which can be merged at the >> > coordinator. >> > For merge join the sorted inner relation needs to be randomly >> > accessible. >> > For replicated relations this can be achieved by materialising the >> > result. >> > But for distributed relations, we do not materialise the sorted result >> > at >> > coordinator but compute the sorted result by merging the sorted results >> > from >> > individual nodes on the fly. For distributed relations, the connection >> > to >> > the datanodes themselves are used as logical tapes (which provide the >> > sorted >> > runs). The final result is computed on the fly by choosing the smallest >> > or >> > greatest row (as required) from the connections. >> > >> > For a Sort node the materialised result can reside in memory (if it fits >> > there) or on one of the logical tapes used for merge sort. So, in order >> > to >> > provide random access to the sorted result, we need to materialise the >> > result either in the memory or on the logical tape. In-memory >> > materialisation is not easily possible since we have already resorted >> > for >> > tape based sort, in case of distributed relations and to materialise the >> > result on tape, there is no logical tape available in current algorithm. >> > To >> > make it work, there are following possible ways >> > >> > 1. When random access is required, materialise the sorted runs from >> > individual nodes onto tapes (one tape for each node) and then merge them >> > on >> > one extra tape, which can be used for materialisation. >> > 2. Use a mix of connections and logical tape in the same tape set. Merge >> > the >> > sorted runs from connections on a logical tape in the same logical tape >> > set. >> > >> > While the second one looks attractive from performance perspective (it >> > saves >> > writing and reading from the tape), it would make the merge code ugly by >> > using mixed tapes. The read calls for connection and logical tape are >> > different and we will need both on the logical tape where the final >> > result >> > is materialized. So, I am thinking of going with 1, in fact, to have >> > same >> > code to handle remote sort, use 1 in all cases (whether or not >> > materialization is required). >> > >> > Had original authors of remote sort code thought about this >> > materialization? >> > Anything they can share on this topic? >> > Any comment? >> > -- >> > Best Wishes, >> > Ashutosh Bapat >> > EntepriseDB Corporation >> > The Enterprise Postgres Company >> > >> > >> > ------------------------------------------------------------------------------ >> > Everyone hates slow websites. So do we. >> > Make your web apps faster with AppDynamics >> > Download AppDynamics Lite for free today: >> > http://p.sf.net/sfu/appdyn_d2d_mar >> > _______________________________________________ >> > Postgres-xc-developers mailing list >> > Pos...@li... >> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> > > > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company |
From: Ashutosh B. <ash...@en...> - 2013-03-26 05:08:26
|
On Tue, Mar 26, 2013 at 10:19 AM, Koichi Suzuki <koi...@gm...>wrote: > On thing we should think for option 1 is: > > When a number of the result is huge, applications has to wait long > time until they get the first row. Because this option may need disk > write, total resource consumption will be larger. > > Yes, I am aware of this fact. Please read the next paragraph and you will see that the current situation is no better. > I'm wondering if we can use "cursor" at database so that we can read > each tape more simply, I mean, to leave each query node open and read > next row from any query node. > > We do that right now. But because of such a simulated cursor (it's not cursor per say, but we just fetch the required result from connection as the demand arises in merging runs), we observer following things If the plan has multiple remote query nodes (as there will be in case of merge join), we assign the same connection to these nodes. Before this assignment, the result from the previous connection is materialised at the coordinator. This means that, when we will get huge result from the datanode, it will be materialised (which will have the more cost as materialising it on tape, as this materialisation happens in a linked list, which is not optimized). We need to share connection between more than one RemoteQuery node because same transaction can not work on two connections to same server. Not only performance, but the code has become ugly because of this approach. At various places in executor, we have special handling for sorting, which needs to be maintained. Instead if we materialise all the result on tape and then proceed with step D5 in Knuth's algorithm for polyphase merge sort, the code will be much simpler and we won't loose much performance. In fact, we might be able to leverage fetching bulk data on connection which can be materialised on tape in bulk. > Regards; > ---------- > Koichi Suzuki > > > 2013/3/25 Ashutosh Bapat <ash...@en...>: > > Hi All, > > I am working on using remote sorting for merge joins. The idea is while > > using merge join at the coordinator, get the data sorted from the > datanodes; > > for replicated relations, we can get all the rows sorted and for > distributed > > tables we have to get sorted runs which can be merged at the coordinator. > > For merge join the sorted inner relation needs to be randomly accessible. > > For replicated relations this can be achieved by materialising the > result. > > But for distributed relations, we do not materialise the sorted result at > > coordinator but compute the sorted result by merging the sorted results > from > > individual nodes on the fly. For distributed relations, the connection to > > the datanodes themselves are used as logical tapes (which provide the > sorted > > runs). The final result is computed on the fly by choosing the smallest > or > > greatest row (as required) from the connections. > > > > For a Sort node the materialised result can reside in memory (if it fits > > there) or on one of the logical tapes used for merge sort. So, in order > to > > provide random access to the sorted result, we need to materialise the > > result either in the memory or on the logical tape. In-memory > > materialisation is not easily possible since we have already resorted for > > tape based sort, in case of distributed relations and to materialise the > > result on tape, there is no logical tape available in current algorithm. > To > > make it work, there are following possible ways > > > > 1. When random access is required, materialise the sorted runs from > > individual nodes onto tapes (one tape for each node) and then merge them > on > > one extra tape, which can be used for materialisation. > > 2. Use a mix of connections and logical tape in the same tape set. Merge > the > > sorted runs from connections on a logical tape in the same logical tape > set. > > > > While the second one looks attractive from performance perspective (it > saves > > writing and reading from the tape), it would make the merge code ugly by > > using mixed tapes. The read calls for connection and logical tape are > > different and we will need both on the logical tape where the final > result > > is materialized. So, I am thinking of going with 1, in fact, to have same > > code to handle remote sort, use 1 in all cases (whether or not > > materialization is required). > > > > Had original authors of remote sort code thought about this > materialization? > > Anything they can share on this topic? > > Any comment? > > -- > > Best Wishes, > > Ashutosh Bapat > > EntepriseDB Corporation > > The Enterprise Postgres Company > > > > > ------------------------------------------------------------------------------ > > Everyone hates slow websites. So do we. > > Make your web apps faster with AppDynamics > > Download AppDynamics Lite for free today: > > http://p.sf.net/sfu/appdyn_d2d_mar > > _______________________________________________ > > Postgres-xc-developers mailing list > > Pos...@li... > > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Koichi S. <koi...@gm...> - 2013-03-26 04:49:19
|
On thing we should think for option 1 is: When a number of the result is huge, applications has to wait long time until they get the first row. Because this option may need disk write, total resource consumption will be larger. I'm wondering if we can use "cursor" at database so that we can read each tape more simply, I mean, to leave each query node open and read next row from any query node. Regards; ---------- Koichi Suzuki 2013/3/25 Ashutosh Bapat <ash...@en...>: > Hi All, > I am working on using remote sorting for merge joins. The idea is while > using merge join at the coordinator, get the data sorted from the datanodes; > for replicated relations, we can get all the rows sorted and for distributed > tables we have to get sorted runs which can be merged at the coordinator. > For merge join the sorted inner relation needs to be randomly accessible. > For replicated relations this can be achieved by materialising the result. > But for distributed relations, we do not materialise the sorted result at > coordinator but compute the sorted result by merging the sorted results from > individual nodes on the fly. For distributed relations, the connection to > the datanodes themselves are used as logical tapes (which provide the sorted > runs). The final result is computed on the fly by choosing the smallest or > greatest row (as required) from the connections. > > For a Sort node the materialised result can reside in memory (if it fits > there) or on one of the logical tapes used for merge sort. So, in order to > provide random access to the sorted result, we need to materialise the > result either in the memory or on the logical tape. In-memory > materialisation is not easily possible since we have already resorted for > tape based sort, in case of distributed relations and to materialise the > result on tape, there is no logical tape available in current algorithm. To > make it work, there are following possible ways > > 1. When random access is required, materialise the sorted runs from > individual nodes onto tapes (one tape for each node) and then merge them on > one extra tape, which can be used for materialisation. > 2. Use a mix of connections and logical tape in the same tape set. Merge the > sorted runs from connections on a logical tape in the same logical tape set. > > While the second one looks attractive from performance perspective (it saves > writing and reading from the tape), it would make the merge code ugly by > using mixed tapes. The read calls for connection and logical tape are > different and we will need both on the logical tape where the final result > is materialized. So, I am thinking of going with 1, in fact, to have same > code to handle remote sort, use 1 in all cases (whether or not > materialization is required). > > Had original authors of remote sort code thought about this materialization? > Anything they can share on this topic? > Any comment? > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_mar > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > |
From: Koichi S. <koi...@gm...> - 2013-03-26 04:38:35
|
I think it's a good idea to disclose what assumption is the HA feature assumes, especially configuration, master slave connection for coordinator/datanode, use of VIP, interface to applications, etc. This will help people to be prepared. I'm also interested if it runs together with Pacemaker/Corosync. Thanks. ---------- Koichi Suzuki # I'm now writing a book on PostgreSQL HA. Do you think I can include this topic to the book? 2012/8/9 坂田 哲夫 <sak...@la...>: > Hi folks, > > We are now developing HA facility for XC with another team rather than XC > core development. Say them XC HA team. The facility will be released as open > source software by the end of this year. > > The HA facility will be developed through two phases as follow. > > 1. furnish fundamental function giving redundancy to XC core like as > streaming replication to make transactional data in data nodes durable, GTM > protecting XIDs with GTM stand-by against failures. These functions are > mainly developed by PostgreSQL and PG XC core development teams > respectively. > > 2. integrate PG XC and Linux-HA. When a component occurs failure, we would > like to fail over automatically. To realize this, XC HA team is going to > develop RAs (resource agents) for data node and coordinator for Linux-HA > (pacemaker) at first, then RA for GTM and GTM standby. > This integration (in other words, developing RAs) is done by XC HA team. > > So far, XC HA team does not provide its products including source codes, > documentation as open source. And its activities are not opened either > except some requirement to XC core which might have influence on XC design, > performance, operations and so on. XC HA team's activities will be opened > after we release our first HA facility as I mentioned. > > best regards, > Tetsuo Sakata. > > > > (2012/07/27 9:20), Koichi Suzuki wrote: >> >> I've heard another group is working together with Linux HA Japan to >> provide XC RA's. >> >> Sakata-san, could you provide related info if available? >> >> Regards; >> ---------- >> Koichi Suzuki >> >> >> 2012/7/24 Nikhil Sontakke <ni...@st...>: >>> >>> Hi, >>> >>> So what's the latest status of these HA activities for PGXC? >>> >>> Like the coordinator/datanode agents being discussed here, do we have >>> agents for GTM for example? Also is this happening in some open source >>> group where we can participate and see the latest and greatest source >>> changes? >>> >>> Regards, >>> Nikhils > > > -- > sakata.tetsuo _at_ lab.ntt.co.jp > SAKATA, Tetsuo. Shinagawa Tokyo JAPAN. |
From: Amit K. <ami...@en...> - 2013-03-26 03:26:53
|
On 4 March 2013 11:11, Amit Khandekar <ami...@en...>wrote: > On 1 March 2013 13:53, Nikhil Sontakke <ni...@st...> wrote: > >> > >> Issue: Whether we should fetch the whole from the datanode (OLD row) > and not > >> just ctid and node_id and required columns and store it at the > coordinator > >> for the processing OR whether we should fetch each row (OLD and NEW > >> variants) while processing each row. > >> > >> Both of them have performance impacts - the first one has disk impact > for > >> large number of rows whereas the second has network impact for querying > >> rows. Is it possible to do some analytical assessment as to which of > them > >> would be better? If you can come up with something concrete (may be > numbers > >> or formulae) we will be able to judge better as to which one to pick up. > > Will check if we can come up with some sensible analysis or figures. > > I have done some analysis on both of these approaches here: https://docs.google.com/document/d/10QPPq_go_wHqKqhmOFXjJAokfdLR8OaUyZVNDu47GWk/edit?usp=sharing In practical terms, we anyways would need to implement (B). The reason is because when the trigger has conditional execution(WHEN clause) we *have* to fetch the rows beforehand, so there is no point in fetching all of them again at the end of the statement when we already have them locally. So may be it would be too ambitious to have have both implementations, at least for this release. So I am focussing on (B) right now. We have two options: 1. Store all rows in palloced memory, and save the HeapTuple pointers in the trigger queue, and directly access the OLD and NEW rows using these pointers when needed. Here we will have no control over how much memory we should use for the old and new records, and this might even hamper system performance, let alone XC performance. 2. Other option is to use tuplestore. Here, we need to store the positions of the records in the tuplestore. So for a particular tigger event, fetch by the position. From what I understand, tuplestore can be advanced only sequentially in either direction. So when the read pointer is at position 6 and we need to fetch a record at position 10, we need to call tuplestore_advance() 4 times, and this call involves palloc/pfree overhead because it calls tuplestore_gettuple(). But the trigger records are not distributed so randomly. In fact a set of trigger events for a particular event id are accessed in the same order as the order in which they are queued. So for a particular event id, only the first access call will require random access. tuplestore supports multiple read pointers, so may be we can make use of that to access the first record using the closest read pointer. > >> > > > > Or we can consider a hybrid approach of getting the rows in batches of > > 1000 or so if possible as well. That ways they get into coordinator > > memory in one shot and can be processed in batches. Obviously this > > should be considered if it's not going to be a complicated > > implementation. > > It just occurred to me that it would not be that hard to optimize the > row-fetching-by-ctid as shown below: > 1. When it is time to fire the queued triggers at the > statement/transaction end, initialize cursors - one cursor per > datanode - which would do: SELECT remote_heap_fetch(table_name, > '<ctidlist>'); We can form this ctidlist out of the trigger even list. > 2. For each trigger event entry in the trigger queue, FETCH NEXT using > the appropriate cursor name according to the datanode id to which the > trigger entry belongs. > > > > >>> Currently we fetch all attributes in the SELECT subplans. I have > >>> created another patch to fetch only the required attribtues, but have > >>> not merged that into this patch. > > > > Do we have other places where we unnecessary fetch all attributes? > > ISTM, this should be fixed as a performance improvement first ahead of > > everything else. > > I believe DML subplan is the only remaining place where we fetch all > attributes. And yes, this is a must-have for triggers, otherwise, the > other optimizations would be of no use. > > > > >>> 2. One important TODO for BEFORE trigger is this: Just before > >>> invoking the trigger functions, in PG, the tuple is row-locked > >>> (exclusive) by GetTupleTrigger() and the locked version is fetched > >>> from the table. So it is made sure that while all the triggers for > >>> that table are executed, no one can update that particular row. > >>> In the patch, we haven't locked the row. We need to lock it either by > >>> executing : > >>> 1. SELECT * from tab1 where ctid = <ctid_val> FOR UPDATE, and then > >>> use the returned ROW as the OLD row. > >>> OR > >>> 2. The UPDATE subplan itself should have SELECT for UPDATE so that > >>> the row is already locked, and we don't have to lock it again. > >>> #2 is simple though it might cause some amount of longer waits in > general. > >>> Using #1, though the locks would be acquired only when the particular > >>> row is updated, the locks would be released only after transaction > >>> end, so #1 might not be worth implementing. > >>> Also #1 requires another explicit remote fetch for the > >>> lock-and-get-latest-version operation. > >>> I am more inclined towards #2. > >>> > >> The option #2 however, has problem of locking too many rows if there are > >> coordinator quals in the subplans IOW the number of rows finally > updated are > >> lesser than the number of rows fetched from the datanode. It can cause > >> unwanted deadlocks. Unless there is a way to release these extra locks, > I am > >> afraid this option will be a problem. > > True. Regardless of anything else - whether it is deadlocks or longer > waits, we should not lock rows that are not to be updated. > > There is a more general row-locking issue that we need to solve first > : 3606317. I anticipate that solving this will solve the trigger > specific lock issue. So for triggers, this is a must-have, and I am > going to solve this issue as part of this bug 3606317. > > >> > > Deadlocks? ISTM, we can get more lock waits because of this but I do > > not see deadlock scenarios.. > > > > With the FQS shipping work being done by Ashutosh, will we also ship > > major chunks of subplans to the datanodes? If yes, then row locking > > will only involve required tuples (hopefully) from the coordinator's > > point of view. > > > > Also, something radical is can be invent a new type of FOR [NODE] > > UPDATE type lock to minimize the impact of such locking of rows on > > datanodes? > > > > Regards, > > Nikhils > > > >>> > >>> 3. The BEFORE trigger function can change the distribution column > >>> itself. We need to add a check at the end of the trigger executions. > >>> > >> > >> Good, you thought about that. Yes we should check it. > >> > >>> > >>> 4. Fetching OLD row for WHEN clause handling. > >>> > >>> 5. Testing with mix of Shippable and non-shippable ROW triggers > >>> > >>> 6. Other types of triggers. INSTEAD triggers are anticipated to work > >>> without significant changes, but they are yet to be tested. > >>> INSERT/DELETE triggers: Most of the infrastructure has been done while > >>> implementing UPDATE triggers. But some changes specific to INSERT and > >>> DELETE are yet to be done. > >>> Deferred triggers to be tested. > >>> > >>> 7. Regression analysis. There are some new failures. Will post another > >>> fair version of the patch after regression analysis and fixing various > >>> TODOs. > >>> > >>> Comments welcome. > >>> > >>> > >>> > ------------------------------------------------------------------------------ > >>> Everyone hates slow websites. So do we. > >>> Make your web apps faster with AppDynamics > >>> Download AppDynamics Lite for free today: > >>> http://p.sf.net/sfu/appdyn_d2d_feb > >>> _______________________________________________ > >>> Postgres-xc-developers mailing list > >>> Pos...@li... > >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > >>> > >> > >> > >> > >> -- > >> Best Wishes, > >> Ashutosh Bapat > >> EntepriseDB Corporation > >> The Enterprise Postgres Company > >> > >> > ------------------------------------------------------------------------------ > >> Everyone hates slow websites. So do we. > >> Make your web apps faster with AppDynamics > >> Download AppDynamics Lite for free today: > >> http://p.sf.net/sfu/appdyn_d2d_feb > >> _______________________________________________ > >> Postgres-xc-developers mailing list > >> Pos...@li... > >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > >> > > > > > > > > -- > > StormDB - http://www.stormdb.com > > The Database Cloud > > Postgres-XC Support and Service > |
From: Michael P. <mic...@gm...> - 2013-03-26 03:02:23
|
On Thu, Aug 9, 2012 at 2:50 PM, 坂田 哲夫 <sak...@la...> wrote: > Hi folks, > > We are now developing HA facility for XC with another team rather than > XC core development. Say them XC HA team. The facility will be released > as open source software by the end of this year. > > The HA facility will be developed through two phases as follow. > > 1. furnish fundamental function giving redundancy to XC core like as > streaming replication to make transactional data in data nodes durable, > GTM protecting XIDs with GTM stand-by against failures. These functions > are mainly developed by PostgreSQL and PG XC core development teams > respectively. > > 2. integrate PG XC and Linux-HA. When a component occurs failure, we > would like to fail over automatically. To realize this, XC HA team is > going to develop RAs (resource agents) for data node and coordinator > for Linux-HA (pacemaker) at first, then RA for GTM and GTM standby. > This integration (in other words, developing RAs) is done by XC HA team. > > So far, XC HA team does not provide its products including source codes, > documentation as open source. And its activities are not opened either > except some requirement to XC core which might have influence on XC > design, performance, operations and so on. XC HA team's activities will > be opened after we release our first HA facility as I mentioned. > In short, +1. ;) -- Michael |
From: 坂田 哲夫 <sak...@la...> - 2013-03-26 02:23:41
|
Hi folks, We are now developing HA facility for XC with another team rather than XC core development. Say them XC HA team. The facility will be released as open source software by the end of this year. The HA facility will be developed through two phases as follow. 1. furnish fundamental function giving redundancy to XC core like as streaming replication to make transactional data in data nodes durable, GTM protecting XIDs with GTM stand-by against failures. These functions are mainly developed by PostgreSQL and PG XC core development teams respectively. 2. integrate PG XC and Linux-HA. When a component occurs failure, we would like to fail over automatically. To realize this, XC HA team is going to develop RAs (resource agents) for data node and coordinator for Linux-HA (pacemaker) at first, then RA for GTM and GTM standby. This integration (in other words, developing RAs) is done by XC HA team. So far, XC HA team does not provide its products including source codes, documentation as open source. And its activities are not opened either except some requirement to XC core which might have influence on XC design, performance, operations and so on. XC HA team's activities will be opened after we release our first HA facility as I mentioned. best regards, Tetsuo Sakata. (2012/07/27 9:20), Koichi Suzuki wrote: > I've heard another group is working together with Linux HA Japan to > provide XC RA's. > > Sakata-san, could you provide related info if available? > > Regards; > ---------- > Koichi Suzuki > > > 2012/7/24 Nikhil Sontakke <ni...@st...>: >> Hi, >> >> So what's the latest status of these HA activities for PGXC? >> >> Like the coordinator/datanode agents being discussed here, do we have >> agents for GTM for example? Also is this happening in some open source >> group where we can participate and see the latest and greatest source >> changes? >> >> Regards, >> Nikhils -- sakata.tetsuo _at_ lab.ntt.co.jp SAKATA, Tetsuo. Shinagawa Tokyo JAPAN. |
From: Amit K. <ami...@en...> - 2013-03-25 12:04:14
|
On 25 March 2013 17:11, Ashutosh Bapat <ash...@en...>wrote: > > > On Mon, Mar 25, 2013 at 1:54 PM, Amit Khandekar < > ami...@en...> wrote: > >> Hi Ashutosh, >> >> I only have following few points: >> >> ------------- >> trailing whitespace characters in the patch. >> >> > Will take care of that. > > >> -------------- >> >> Re: "Subqueries and permission checks", Just to make sure I understand >> the scope of the patch ... >> The statement "In very near future, we should be able to use the >> infrastructure for FQS for shipping sub-queries without planning at the >> coordinator, the whole query is not shippable" , can you please elaborate >> on this point ? Is it that individual subqueries are not fast-shipped in >> some cases if the whole query is not shippable ? For e.g. : >> >> create table t2 (c1 int, c2 int, c3 int) distribute by hash(c2); >> create table t1 (c1 int, c2 int, c3 int) distribute by replication; >> >> explain select * from (select avg(c1), c2 from t2 group by c2) a1 join >> (select avg(c1), c2 from t1 group by c2) b1 using >> (c2); QUERY >> PLAN >> >> ---------------------------------------------------------------------------- >> Data Node Scan on "__REMOTE_FQS_QUERY__" (cost=0.00..0.00 rows=0 >> width=0) >> Node/s: datanode_1, datanode_2 >> >> Now if I disable FQS, the join does not get reduced. That means, if the >> above query is used as a subquery in a FROM clause of an outer query, the >> subquery won't get pushed ? I guess this is expected, because as you said >> "Subqueries are not shipped if the whole query is not-shippable". But not >> sure. >> >> > I think I mis-constructed that sentence. The idea is this: while walking > the query before planning to check whether it can FQSed or not, we mark > each subquery in that query as un/shippable irrespective of the parent > query. If the whole query is not shippable, we will use this information to > ship the subquery without planning in subquery_planner(). In the example > that you have given, as of now, we plan both the sides of subquery in > subquery planner. But if we see that they are shippable in > subquery_planner() we can ship them as they are without planning, and then > reduce the join if it's reducible. This is not in the scope of current work. > Ok. > > >> >> ------------ >> >> There is a comment below in xc_FQS_join.sql . I guess we need to correct >> the comment now. >> >> -- JOIN between two distributed relations is not shippable as of now, >> because we >> -- don't store the distribution information completely while deducing >> nodes in >> -- the planner. If we come to that some time in the future, we would see >> change >> -- in the plan below >> >> >> > Ok > > >> ------------ >> >> Can you please correct some typos in the comment : >> >> * pgxc_replace_dist_vars_subquery >> * The function looks up the *member so ExecNodes::en_dist*_var in the >> * query->targetList. If found, they are re-stamped with the given varno >> and >> * resno of the TargetEntry found and added to the new distribution var >> list >> * being created. This function is useful to re-stamp the distribution >> columns >> * of a subquery. >> >> ------------------ >> >> > It should be "members of" right? > That's what I guessed too. > > >> Other than this, I have no more issues, and you can go ahead with the >> commit. >> >> >> Should I commit the patch after taking care of these comments or you want > to have a second glance? > Nope, second glance not needed. go ahead and commit. > Thanks >> -Amit >> >> >> >> On 26 February 2013 18:07, Ashutosh Bapat < >> ash...@en...> wrote: >> >>> Hi All, >>> PFA the patch for using FQS for queries with subqueries. Subqueries >>> appear in the query as sublinks or as relations in From clause. The >>> sublinks are in-general shippable if all the relations involved are >>> replicated. I haven't got a rule to handle subqueries where distributed >>> tables are involved. For example, looking at the datanodes where individual >>> sublinks in query be below, should be shipped, I can not come up with a >>> rule to merge them so as to find a set of datanodes where the query can be >>> shipped. >>> >>> select (select tab1.val from tab1 where tab1.val = tab2.val2), tab2.val2 >>> from tab2 where tab2.val = (select sum(val2) from tab1 group by val) >>> >>> In fact, you will notice that the above query is unshippable by FQS. >>> >>> This patch deals with the subqueries in the From clause of the query. >>> The idea is every query result has a set of datanodes where the result can >>> be computed and the distribution columns (sometime computed ones) on which >>> the distribution of the result depends. With this information the existing >>> rules for shippability of join are applicable. This patch includes fixes >>> discussed in threads with subjects "Subqueries and permission checks" and >>> "Annotating ExecNodes with distribution column information". >>> >>> Regression shows no new failures. >>> -- >>> Best Wishes, >>> Ashutosh Bapat >>> EntepriseDB Corporation >>> The Enterprise Postgres Company >>> >>> >>> ------------------------------------------------------------------------------ >>> Everyone hates slow websites. So do we. >>> Make your web apps faster with AppDynamics >>> Download AppDynamics Lite for free today: >>> http://p.sf.net/sfu/appdyn_d2d_feb >>> >>> _______________________________________________ >>> Postgres-xc-developers mailing list >>> Pos...@li... >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>> >>> >> > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > |
From: Ashutosh B. <ash...@en...> - 2013-03-25 11:41:16
|
On Mon, Mar 25, 2013 at 1:54 PM, Amit Khandekar < ami...@en...> wrote: > Hi Ashutosh, > > I only have following few points: > > ------------- > trailing whitespace characters in the patch. > > Will take care of that. > -------------- > > Re: "Subqueries and permission checks", Just to make sure I understand > the scope of the patch ... > The statement "In very near future, we should be able to use the > infrastructure for FQS for shipping sub-queries without planning at the > coordinator, the whole query is not shippable" , can you please elaborate > on this point ? Is it that individual subqueries are not fast-shipped in > some cases if the whole query is not shippable ? For e.g. : > > create table t2 (c1 int, c2 int, c3 int) distribute by hash(c2); > create table t1 (c1 int, c2 int, c3 int) distribute by replication; > > explain select * from (select avg(c1), c2 from t2 group by c2) a1 join > (select avg(c1), c2 from t1 group by c2) b1 using > (c2); QUERY > PLAN > > ---------------------------------------------------------------------------- > Data Node Scan on "__REMOTE_FQS_QUERY__" (cost=0.00..0.00 rows=0 width=0) > Node/s: datanode_1, datanode_2 > > Now if I disable FQS, the join does not get reduced. That means, if the > above query is used as a subquery in a FROM clause of an outer query, the > subquery won't get pushed ? I guess this is expected, because as you said > "Subqueries are not shipped if the whole query is not-shippable". But not > sure. > > I think I mis-constructed that sentence. The idea is this: while walking the query before planning to check whether it can FQSed or not, we mark each subquery in that query as un/shippable irrespective of the parent query. If the whole query is not shippable, we will use this information to ship the subquery without planning in subquery_planner(). In the example that you have given, as of now, we plan both the sides of subquery in subquery planner. But if we see that they are shippable in subquery_planner() we can ship them as they are without planning, and then reduce the join if it's reducible. This is not in the scope of current work. > > ------------ > > There is a comment below in xc_FQS_join.sql . I guess we need to correct > the comment now. > > -- JOIN between two distributed relations is not shippable as of now, > because we > -- don't store the distribution information completely while deducing > nodes in > -- the planner. If we come to that some time in the future, we would see > change > -- in the plan below > > > Ok > ------------ > > Can you please correct some typos in the comment : > > * pgxc_replace_dist_vars_subquery > * The function looks up the *member so ExecNodes::en_dist*_var in the > * query->targetList. If found, they are re-stamped with the given varno > and > * resno of the TargetEntry found and added to the new distribution var > list > * being created. This function is useful to re-stamp the distribution > columns > * of a subquery. > > ------------------ > > It should be "members of" right? > Other than this, I have no more issues, and you can go ahead with the > commit. > > > Should I commit the patch after taking care of these comments or you want to have a second glance? > Thanks > -Amit > > > > On 26 February 2013 18:07, Ashutosh Bapat <ash...@en... > > wrote: > >> Hi All, >> PFA the patch for using FQS for queries with subqueries. Subqueries >> appear in the query as sublinks or as relations in From clause. The >> sublinks are in-general shippable if all the relations involved are >> replicated. I haven't got a rule to handle subqueries where distributed >> tables are involved. For example, looking at the datanodes where individual >> sublinks in query be below, should be shipped, I can not come up with a >> rule to merge them so as to find a set of datanodes where the query can be >> shipped. >> >> select (select tab1.val from tab1 where tab1.val = tab2.val2), tab2.val2 >> from tab2 where tab2.val = (select sum(val2) from tab1 group by val) >> >> In fact, you will notice that the above query is unshippable by FQS. >> >> This patch deals with the subqueries in the From clause of the query. The >> idea is every query result has a set of datanodes where the result can be >> computed and the distribution columns (sometime computed ones) on which the >> distribution of the result depends. With this information the existing >> rules for shippability of join are applicable. This patch includes fixes >> discussed in threads with subjects "Subqueries and permission checks" and >> "Annotating ExecNodes with distribution column information". >> >> Regression shows no new failures. >> -- >> Best Wishes, >> Ashutosh Bapat >> EntepriseDB Corporation >> The Enterprise Postgres Company >> >> >> ------------------------------------------------------------------------------ >> Everyone hates slow websites. So do we. >> Make your web apps faster with AppDynamics >> Download AppDynamics Lite for free today: >> http://p.sf.net/sfu/appdyn_d2d_feb >> >> _______________________________________________ >> Postgres-xc-developers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >> > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Ashutosh B. <ash...@en...> - 2013-03-25 11:13:48
|
Hi All, I am working on using remote sorting for merge joins. The idea is while using merge join at the coordinator, get the data sorted from the datanodes; for replicated relations, we can get all the rows sorted and for distributed tables we have to get sorted runs which can be merged at the coordinator. For merge join the sorted inner relation needs to be randomly accessible. For replicated relations this can be achieved by materialising the result. But for distributed relations, we do not materialise the sorted result at coordinator but compute the sorted result by merging the sorted results from individual nodes on the fly. For distributed relations, the connection to the datanodes themselves are used as logical tapes (which provide the sorted runs). The final result is computed on the fly by choosing the smallest or greatest row (as required) from the connections. For a Sort node the materialised result can reside in memory (if it fits there) or on one of the logical tapes used for merge sort. So, in order to provide random access to the sorted result, we need to materialise the result either in the memory or on the logical tape. In-memory materialisation is not easily possible since we have already resorted for tape based sort, in case of distributed relations and to materialise the result on tape, there is no logical tape available in current algorithm. To make it work, there are following possible ways 1. When random access is required, materialise the sorted runs from individual nodes onto tapes (one tape for each node) and then merge them on one extra tape, which can be used for materialisation. 2. Use a mix of connections and logical tape in the same tape set. Merge the sorted runs from connections on a logical tape in the same logical tape set. While the second one looks attractive from performance perspective (it saves writing and reading from the tape), it would make the merge code ugly by using mixed tapes. The read calls for connection and logical tape are different and we will need both on the logical tape where the final result is materialized. So, I am thinking of going with 1, in fact, to have same code to handle remote sort, use 1 in all cases (whether or not materialization is required). Had original authors of remote sort code thought about this materialization? Anything they can share on this topic? Any comment? -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Amit K. <ami...@en...> - 2013-03-25 09:54:47
|
Hi Ashutosh, I only have following few points: ------------- trailing whitespace characters in the patch. -------------- Re: "Subqueries and permission checks", Just to make sure I understand the scope of the patch ... The statement "In very near future, we should be able to use the infrastructure for FQS for shipping sub-queries without planning at the coordinator, the whole query is not shippable" , can you please elaborate on this point ? Is it that individual subqueries are not fast-shipped in some cases if the whole query is not shippable ? For e.g. : create table t2 (c1 int, c2 int, c3 int) distribute by hash(c2); create table t1 (c1 int, c2 int, c3 int) distribute by replication; explain select * from (select avg(c1), c2 from t2 group by c2) a1 join (select avg(c1), c2 from t1 group by c2) b1 using (c2); QUERY PLAN ---------------------------------------------------------------------------- Data Node Scan on "__REMOTE_FQS_QUERY__" (cost=0.00..0.00 rows=0 width=0) Node/s: datanode_1, datanode_2 Now if I disable FQS, the join does not get reduced. That means, if the above query is used as a subquery in a FROM clause of an outer query, the subquery won't get pushed ? I guess this is expected, because as you said "Subqueries are not shipped if the whole query is not-shippable". But not sure. ------------ There is a comment below in xc_FQS_join.sql . I guess we need to correct the comment now. -- JOIN between two distributed relations is not shippable as of now, because we -- don't store the distribution information completely while deducing nodes in -- the planner. If we come to that some time in the future, we would see change -- in the plan below ------------ Can you please correct some typos in the comment : * pgxc_replace_dist_vars_subquery * The function looks up the *member so ExecNodes::en_dist*_var in the * query->targetList. If found, they are re-stamped with the given varno and * resno of the TargetEntry found and added to the new distribution var list * being created. This function is useful to re-stamp the distribution columns * of a subquery. ------------------ Other than this, I have no more issues, and you can go ahead with the commit. Thanks -Amit On 26 February 2013 18:07, Ashutosh Bapat <ash...@en...>wrote: > Hi All, > PFA the patch for using FQS for queries with subqueries. Subqueries appear > in the query as sublinks or as relations in From clause. The sublinks are > in-general shippable if all the relations involved are replicated. I > haven't got a rule to handle subqueries where distributed tables are > involved. For example, looking at the datanodes where individual sublinks > in query be below, should be shipped, I can not come up with a rule to > merge them so as to find a set of datanodes where the query can be shipped. > > select (select tab1.val from tab1 where tab1.val = tab2.val2), tab2.val2 > from tab2 where tab2.val = (select sum(val2) from tab1 group by val) > > In fact, you will notice that the above query is unshippable by FQS. > > This patch deals with the subqueries in the From clause of the query. The > idea is every query result has a set of datanodes where the result can be > computed and the distribution columns (sometime computed ones) on which the > distribution of the result depends. With this information the existing > rules for shippability of join are applicable. This patch includes fixes > discussed in threads with subjects "Subqueries and permission checks" and > "Annotating ExecNodes with distribution column information". > > Regression shows no new failures. > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_feb > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > |
From: Koichi S. <koi...@gm...> - 2013-03-24 03:45:29
|
2013/3/21 Bei Xu <be...@ad...>: > Hi, Koichi: > Base on your reply, > > Since slave is a copy of master, the slave has the same GTM_proxy listed > in postgresql.conf as the master, it will connect to server3's proxy AFTER > SLAVE IS STARTED, > And we will only change the slave's proxy to server 4 AFTER promotion, > correct? > > Thus, looks like SLAVE needs to connect to A PROXY at ALL TIME: before > promotion is server3's proxy, after promotion is server 4's proxy. No, master doesn't go to proxy to connect. Instead, proxy goes to the master. GTM is a server and proxy is a client. So, as applications has to reconnect to the new master of PostgreSQL when it fails over, GTM proxy should reconnect to the new GTM master. > > Please take a look at following 2 senarios: > Senario1: If slave was configured with server4's proxy AFTER SLAVE IS > STARTED, upon server 3 failure, we will do : > 1) promote on slave > Since slave is already connect server 4's proxy, we don't have to do > anything here. GTM proxy should not be connected to the slave until it fails over. This will make whole cluster transaction status inconsistent. Please take a look at bash version of pgxc_ctl, which describes how gtm master can be handled. > > senario2: If slave was configured with server3's proxy AFTER SLAVE IS > STARTED, upon server 3 failure, we will do: > 1) restart slave to change proxy from server3's proxy value to server4's > proxy value > 2) promote on slave > > Obviously, senario1 has less steps and simpler, senario2 is suggested by > you. Is there any reason you suggested senario2? > > My concern is, If a slave is connect to any active proxy (the proxy is > started and pointing to the GTM), will the transaction be applied TWICE? > One from proxy, one from the master? Proxy is just a proxy. When a transaction starts and GXID is given, then the master fails and the slave takes over, it carries over such GXID status. When you are finished, because coordinators continues to connect to the same gtm proxy, coordinator will report that the transaction is finished. It is transparent. When a coordinator fails, the transaction fails too. In this case, the coordinator is failed over by its slave. Slave should reconnect to local gtm proxy. Of course, if old gtm proxy is running, failed over coordinator can connect to its original (remote) gtm proxy. However, it will waste network traffic so it is highly recommended to connect to the local one. Also, in this case, all the other coordinator should be notified that it is now at the different access point, if you don't use VIP to carry over IP address. This notification can be done by ALTER NODE statement. Datanode failover can be handled similarly. Please take a look at scripts in pgxc_ctl (bash version), which comes will all of such steps. Best; --- Koichi Suzuki > > > > > > > On 3/21/13 12:40 AM, "Koichi Suzuki" <koi...@gm...> wrote: > >>Only after promotion. Before promotion, they will not be connected >>to gtm_proxy. >> >>Regards; >>---------- >>Koichi Suzuki >> >> >>2013/3/21 Bei Xu <be...@ad...>: >>> Hi Koichi: >>> Thanks for the reply. I still have doubts for item 1. If we setup >>> proxy on server 4, do we reconfigure server 4's coordinator/datanodes to >>> point to server 4's proxy at ALL TIME(after replication is setup, I can >>> change gtm_host to point to server4's proxy before I bring up slaves) or >>> only AFTER promotion? >>> >>> >>> On 3/20/13 11:08 PM, "Koichi Suzuki" <koi...@gm...> wrote: >>> >>>>1. It's better to have gtm proxy at server 4 when you failover to this >>>>server. We need gtm proxy now to failover GTM while >>>>coordinators/datanodes are running. When you simply make a copy of >>>>coordinator/datanode with pg_basebackup and promote them, they will >>>>try to connect to gtm_proxy at server3. You need to reconfigure them >>>>to connect to gtm_proxy at server4. >>>> >>>>2. Only one risk is the recovery point could be different from >>>>component to component, I mean, some transaction may be committed at >>>>some node but aborted at another because there could be some >>>>difference in available WAL records. It may possible to improve the >>>>core to handle this to some extent but please understand there will be >>>>some corner case, especially if DDL is involved in such a case. This >>>>chance could be small and you may be able to correct this manually or >>>>this can be allowed in some applications. >>>> >>>>Regards; >>>>---------- >>>>Koichi Suzuki >>>> >>>> >>>>2013/3/21 Bei Xu <be...@ad...>: >>>>> Hi, I want to set up HA for pgxc, please see below for my current >>>>>setup. >>>>> >>>>> server1: 1 GTM >>>>> server2: 1 GTM_Standby >>>>> server3 (master): 1 proxy >>>>> 1 coordinator >>>>> 2 datanode >>>>> >>>>> Server4: (stream replication slave) : 1 standalone proxy ?? >>>>> 1 replicated coordinator (slave of >>>>> server3's coordinator) >>>>> 2 replicated datanode (slave of >>>>> server3's datanodes) >>>>> >>>>> >>>>> server3's coordinator and datanodes are the master of the server4's >>>>> coordinator/datanodes by stream replication. >>>>> >>>>> Question. >>>>> 1. Should there be a proxy on server 4? If not, which proxy should >>>>>the >>>>> server4's coordinator and datanodes pointing to? (I have to specify >>>>>the >>>>> gtm_host in postgresql.conf)/ >>>>> 2. Do I have to use synchronous replication vs Asynchrous replication? >>>>>I am >>>>> currently using Asynchrnous replication because I think if I use >>>>> synchronous, slave failour will affect master. >>>>> >>>>> >>>>>----------------------------------------------------------------------- >>>>>-- >>>>>----- >>>>> Everyone hates slow websites. So do we. >>>>> Make your web apps faster with AppDynamics >>>>> Download AppDynamics Lite for free today: >>>>> http://p.sf.net/sfu/appdyn_d2d_mar >>>>> _______________________________________________ >>>>> Postgres-xc-developers mailing list >>>>> Pos...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>> >>>> >>> >>> >> > > |