You can subscribe to this list here.
2010 |
Jan
|
Feb
|
Mar
|
Apr
(10) |
May
(17) |
Jun
(3) |
Jul
|
Aug
|
Sep
(8) |
Oct
(18) |
Nov
(51) |
Dec
(74) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2011 |
Jan
(47) |
Feb
(44) |
Mar
(44) |
Apr
(102) |
May
(35) |
Jun
(25) |
Jul
(56) |
Aug
(69) |
Sep
(32) |
Oct
(37) |
Nov
(31) |
Dec
(16) |
2012 |
Jan
(34) |
Feb
(127) |
Mar
(218) |
Apr
(252) |
May
(80) |
Jun
(137) |
Jul
(205) |
Aug
(159) |
Sep
(35) |
Oct
(50) |
Nov
(82) |
Dec
(52) |
2013 |
Jan
(107) |
Feb
(159) |
Mar
(118) |
Apr
(163) |
May
(151) |
Jun
(89) |
Jul
(106) |
Aug
(177) |
Sep
(49) |
Oct
(63) |
Nov
(46) |
Dec
(7) |
2014 |
Jan
(65) |
Feb
(128) |
Mar
(40) |
Apr
(11) |
May
(4) |
Jun
(8) |
Jul
(16) |
Aug
(11) |
Sep
(4) |
Oct
(1) |
Nov
(5) |
Dec
(16) |
2015 |
Jan
(5) |
Feb
|
Mar
(2) |
Apr
(5) |
May
(4) |
Jun
(12) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(4) |
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
S | M | T | W | T | F | S |
---|---|---|---|---|---|---|
|
1
(14) |
2
|
3
(4) |
4
(12) |
5
(14) |
6
|
7
(1) |
8
(7) |
9
(10) |
10
(7) |
11
(8) |
12
(6) |
13
|
14
(1) |
15
(3) |
16
(1) |
17
(8) |
18
(11) |
19
(3) |
20
|
21
(2) |
22
(9) |
23
(2) |
24
(14) |
25
(13) |
26
(1) |
27
|
28
|
29
(1) |
30
(11) |
|
|
|
|
From: Abbas B. <abb...@en...> - 2013-04-24 17:25:54
|
Hi, The test cases was failing for a four datanode cluster. The reason was that some tables were not created on a well defined datanode set. -- *Abbas* Architect Ph: 92.334.5100153 Skype ID: gabbasb www.enterprisedb.co <http://www.enterprisedb.com/>m<http://www.enterprisedb.com/> * Follow us on Twitter* @EnterpriseDB Visit EnterpriseDB for tutorials, webinars, whitepapers<http://www.enterprisedb.com/resources-community>and more<http://www.enterprisedb.com/resources-community> |
From: Ashutosh B. <ash...@en...> - 2013-04-24 14:04:42
|
Good, that works. This bug is causing testcase truncate to fail. On Wed, Apr 24, 2013 at 6:53 PM, Nikhil Sontakke <ni...@st...>wrote: > Hi Ashutosh, > > By the EOW? > > Regards, > Nikhils > > > > On Wed, Apr 24, 2013 at 6:49 PM, Ashutosh Bapat < > ash...@en...> wrote: > >> Hi Nikhil, >> Thanks for taking this up? >> >> By when do you think you can provide the patch? >> >> >> On Wed, Apr 24, 2013 at 6:01 PM, Nikhil Sontakke <ni...@st...>wrote: >> >>> >>> >>> >>> >>>> ResetSequence(), the function being called from ExecuteTruncate() does >>>> not send reset message to GTM. It applies sequence changes locally on the >>>> coordinator, which is not enough. >>>> >>>> Can someone with relevant experience look into this problem and provide >>>> a fix? >>>> >>>> I have attached the testcase and its output showing the bug. >>>> >>>> >>> I guess setval() was handled but we forgot to handle reset sequence. I >>> will take this up when I cleanup currval, nextval for negative sequences. >>> >>> Regards, >>> Nikhils >>> >> >> >> >> -- >> Best Wishes, >> Ashutosh Bapat >> EntepriseDB Corporation >> The Postgres Database Company >> > > > > -- > StormDB - http://www.stormdb.com > The Database Cloud > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Postgres Database Company |
From: Nikhil S. <ni...@st...> - 2013-04-24 13:24:11
|
Hi Ashutosh, By the EOW? Regards, Nikhils On Wed, Apr 24, 2013 at 6:49 PM, Ashutosh Bapat < ash...@en...> wrote: > Hi Nikhil, > Thanks for taking this up? > > By when do you think you can provide the patch? > > > On Wed, Apr 24, 2013 at 6:01 PM, Nikhil Sontakke <ni...@st...>wrote: > >> >> >> >> >>> ResetSequence(), the function being called from ExecuteTruncate() does >>> not send reset message to GTM. It applies sequence changes locally on the >>> coordinator, which is not enough. >>> >>> Can someone with relevant experience look into this problem and provide >>> a fix? >>> >>> I have attached the testcase and its output showing the bug. >>> >>> >> I guess setval() was handled but we forgot to handle reset sequence. I >> will take this up when I cleanup currval, nextval for negative sequences. >> >> Regards, >> Nikhils >> > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Postgres Database Company > -- StormDB - http://www.stormdb.com The Database Cloud |
From: Ashutosh B. <ash...@en...> - 2013-04-24 13:19:52
|
Hi Nikhil, Thanks for taking this up? By when do you think you can provide the patch? On Wed, Apr 24, 2013 at 6:01 PM, Nikhil Sontakke <ni...@st...>wrote: > > > > >> ResetSequence(), the function being called from ExecuteTruncate() does >> not send reset message to GTM. It applies sequence changes locally on the >> coordinator, which is not enough. >> >> Can someone with relevant experience look into this problem and provide a >> fix? >> >> I have attached the testcase and its output showing the bug. >> >> > I guess setval() was handled but we forgot to handle reset sequence. I > will take this up when I cleanup currval, nextval for negative sequences. > > Regards, > Nikhils > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Postgres Database Company |
From: Nikhil S. <ni...@st...> - 2013-04-24 12:32:08
|
> ResetSequence(), the function being called from ExecuteTruncate() does not > send reset message to GTM. It applies sequence changes locally on the > coordinator, which is not enough. > > Can someone with relevant experience look into this problem and provide a > fix? > > I have attached the testcase and its output showing the bug. > > I guess setval() was handled but we forgot to handle reset sequence. I will take this up when I cleanup currval, nextval for negative sequences. Regards, Nikhils |
From: Koichi S. <koi...@gm...> - 2013-04-24 02:35:24
|
It is system level error which may not happen so often in production. The cause is failure in semaphore operation through Linux system call called semop(). The failure may be called if you leave too many semaphore unreleased in your system. If you kill too many postmaster process of coordinator/datanode, they may lose a chance to release these kernel resourcdes. Please try ipcs -s (shows semaphore) and ipcs -m (shared memory) for all the users in your system. It may help to see what's going on. ---------- Koichi Suzuki 2013/4/24 Venky Kandaswamy <ve...@ad...> > Hi All, > > One of the datanodes (but not always the same node) gets a PANIC from > the kernel and aborts. This causes the entire system to freeze. Cannot get > any new connections and some transactions go into waiting. The message we > are seeing in the logs are cryptic: > > {time user process message} > > 2013-04-23 11:42:49 PDT adchemy 17573 PANIC: semop(id=189366400) failed: > Invalid argument > 2013-04-23 11:42:49 PDT adchemy 17561 PANIC: semop(id=189366400) failed: > Invalid argument > 2013-04-23 11:42:50 PDT adchemy 17563 PANIC: semop(id=189366400) failed: > Invalid argument > 2013-04-23 11:42:50 PDT adchemy 17576 PANIC: semop(id=189366400) failed: > Invalid argument > 2013-04-23 12:08:31 PDT adchemy 17801 PANIC: semop(id=203915392) failed: > Invalid argument > 2013-04-23 12:08:31 PDT adchemy 17789 PANIC: semop(id=203915392) failed: > Invalid argument > 2013-04-23 12:08:31 PDT adchemy 17787 PANIC: semop(id=203915392) failed: > Invalid argument > 2013-04-23 12:08:33 PDT adchemy 17803 PANIC: semop(id=203915392) failed: > Invalid argument > 2013-04-23 12:08:33 PDT adchemy 17791 PANIC: semop(id=203915392) failed: > Invalid argument > 2013-04-23 12:09:31 PDT analytics 17812 PANIC: semop(id=203882623) > failed: Invalid argument > 2013-04-23 12:09:31 PDT adchemy 17797 PANIC: semop(id=203882623) failed: > Invalid argument > 2013-04-23 12:14:11 PDT analytics 17785 PANIC: semop(id=203882623) > failed: Invalid argument > 2013-04-23 12:14:11 PDT adchemy 17805 PANIC: semop(id=203882623) failed: > Invalid argument > 2013-04-23 12:16:11 PDT analytics 17783 PANIC: semop(id=203915392) > failed: Invalid argument > 2013-04-23 12:16:11 PDT analytics 17784 PANIC: semop(id=203915392) > failed: Invalid argument > 2013-04-23 12:16:11 PDT analytics 17786 PANIC: semop(id=203915392) > failed: Invalid argument > 2013-04-23 12:23:32 PDT analytics 17938 PANIC: semop(id=210141312) > failed: Invalid argument > 2013-04-23 12:23:32 PDT analytics 18544 PANIC: semop(id=210141312) > failed: Invalid argument > 2013-04-23 12:23:32 PDT analytics 17937 PANIC: semop(id=210141312) > failed: Invalid argument > 2013-04-23 12:23:32 PDT analytics 17951 PANIC: semop(id=210141312) > failed: Invalid argument > > Any thoughts on why this might happen? > > ________________________________________ > > Venky Kandaswamy > > Principal Engineer, Adchemy Inc. > > 925-200-7124 > > > ------------------------------------------------------------------------------ > Try New Relic Now & We'll Send You this Cool Shirt > New Relic is the only SaaS-based application performance monitoring service > that delivers powerful full stack analytics. Optimize and monitor your > browser, app, & servers with just a few lines of code. Try New Relic > and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > |
From: Venky K. <ve...@ad...> - 2013-04-23 19:55:39
|
Hi All, One of the datanodes (but not always the same node) gets a PANIC from the kernel and aborts. This causes the entire system to freeze. Cannot get any new connections and some transactions go into waiting. The message we are seeing in the logs are cryptic: {time user process message} 2013-04-23 11:42:49 PDT adchemy 17573 PANIC: semop(id=189366400) failed: Invalid argument 2013-04-23 11:42:49 PDT adchemy 17561 PANIC: semop(id=189366400) failed: Invalid argument 2013-04-23 11:42:50 PDT adchemy 17563 PANIC: semop(id=189366400) failed: Invalid argument 2013-04-23 11:42:50 PDT adchemy 17576 PANIC: semop(id=189366400) failed: Invalid argument 2013-04-23 12:08:31 PDT adchemy 17801 PANIC: semop(id=203915392) failed: Invalid argument 2013-04-23 12:08:31 PDT adchemy 17789 PANIC: semop(id=203915392) failed: Invalid argument 2013-04-23 12:08:31 PDT adchemy 17787 PANIC: semop(id=203915392) failed: Invalid argument 2013-04-23 12:08:33 PDT adchemy 17803 PANIC: semop(id=203915392) failed: Invalid argument 2013-04-23 12:08:33 PDT adchemy 17791 PANIC: semop(id=203915392) failed: Invalid argument 2013-04-23 12:09:31 PDT analytics 17812 PANIC: semop(id=203882623) failed: Invalid argument 2013-04-23 12:09:31 PDT adchemy 17797 PANIC: semop(id=203882623) failed: Invalid argument 2013-04-23 12:14:11 PDT analytics 17785 PANIC: semop(id=203882623) failed: Invalid argument 2013-04-23 12:14:11 PDT adchemy 17805 PANIC: semop(id=203882623) failed: Invalid argument 2013-04-23 12:16:11 PDT analytics 17783 PANIC: semop(id=203915392) failed: Invalid argument 2013-04-23 12:16:11 PDT analytics 17784 PANIC: semop(id=203915392) failed: Invalid argument 2013-04-23 12:16:11 PDT analytics 17786 PANIC: semop(id=203915392) failed: Invalid argument 2013-04-23 12:23:32 PDT analytics 17938 PANIC: semop(id=210141312) failed: Invalid argument 2013-04-23 12:23:32 PDT analytics 18544 PANIC: semop(id=210141312) failed: Invalid argument 2013-04-23 12:23:32 PDT analytics 17937 PANIC: semop(id=210141312) failed: Invalid argument 2013-04-23 12:23:32 PDT analytics 17951 PANIC: semop(id=210141312) failed: Invalid argument Any thoughts on why this might happen? ________________________________________ Venky Kandaswamy Principal Engineer, Adchemy Inc. 925-200-7124 |
From: 鈴木 幸市 <ko...@in...> - 2013-04-23 00:39:58
|
In 1.2 later, I think it will be a good idea to "pre-shutdown" the datanode so that no further connection from other nodes, except for psql or other direct applications, perhaps by superusers, are allowed. This will allow any local housekeeping operation needed to remove the node in more polite way. Thank you; --- Koichi Suzuki On 2013/04/23, at 5:29, Abbas Butt <abb...@en...> wrote: > > > On Mon, Apr 22, 2013 at 2:30 PM, 鈴木 幸市 <ko...@in...> wrote: > In this case, after ALTER TABLE REDISTRIBUTE is issued against tab1, yes, DML to tab2 may target to the datanode being removed. However, in this case, because tab1 has been redistributed, no DML to tab1 will target to tab1. > > After then when ALTER TABLE REDISTRIBUTE is issued against tab2, all the data are redistributed and after then no DML to tab2 will be targetted to the datanode to remove. > > We have discussed this issue about a year ago if we should exclude and redistribute all the tables before removing datanode and concluded that this should be DBA's responsibility to exclude the datanode to be removed from all the distribution, manually or automatic. > > As much as DBA issues ALTER TABLE correctly to all the tables, there's no chance to bring XC status into inconsistent status so I don't think we need DML blocking. > > Any other inputs? > > I was not taking into account the fact that after tab1 has been redistributed any DMLs will not target the removed node. I agree we do not need any DML blocking. I will commit the updated steps in the repository. > > --- > Koichi Suzuki > > > > On 2013/04/22, at 16:42, Abbas Butt <abb...@en...> wrote: > >> Consider this case: >> Assume two tables in a database: tab1 and tab2, assume both are distributed by round robin. >> Assume a client C1 is connected to the cluster and is running a loop to insert rows to tab1. >> Assume administrator connects to the cluster and issues ALTER TABLE REDISTRIBUTE for table tab2. The moment this alter finishes, assume C1 starts inserting rows to tab2, while administrator issues ALTER TABLE REDISTRIBUTE for table tab1, thinking tab2 is clear. >> Since the administrator has to issue ALTER TABLE REDISTRIBUTE table by table for all tables in all databases, client C1 would always have a chance to insert more rows in the table for which administrator has already issued ALTER TABLE REDISTRIBUTE. >> >> For this reason we need DML blocking. >> >> Comments/Suggestions are welcome. >> >> On Mon, Apr 22, 2013 at 11:15 AM, 鈴木 幸市 <ko...@in...> wrote: >> Sorry Abbas, I have a question/comment on removing a datanode. >> >> Before DBA would like to remove a datanode, he/she must run ALTER TABLE to save all the data in the node to others. Therefore, I'm not sure if we need another means to lock DML. >> >> I understand TMP table will be an issue. If a TMP table is created over multiple nodes, it is under 2PC control which is not allowed so far. So what we can do is to create TMP table on a particular node only. If it is the removing datanode, operation will fail and I think it can keep whole cluster in a consistent status. >> >> Yes, it is all DBA's responsibility to make sure that no data are left in the datanode. We have a means to make it sure as you submitted. >> >> So again, I'm not yet sure if DML blocking is still needed. >> >> Regards; >> --- >> Koichi Suzuki >> >> >> >> On 2013/04/20, at 5:22, Abbas Butt <abb...@en...> wrote: >> >>> Hi, >>> >>> Here are the proposed steps to remove a node from the cluster. >>> >>> Removing an existing coordinator >>> ========================== >>> >>> Assume a two coordinator cluster, COORD_1 & COORD_2 >>> Suppose we want to remove COORD2 for any reason. >>> >>> 1. Stop the coordinator to be removed. >>> In our example we need to stop COORD_2. >>> >>> 2. Connect to any of the coordinators except the one to be removed. >>> In our example assuming COORD_1 is running on port 5432, >>> the following command would connect to COORD_1 >>> >>> psql postgres -p 5432 >>> >>> 3. Drop the coordinator to be removed. >>> For example to drop coordinator COORD_2 >>> >>> DROP NODE COORD_2; >>> >>> 4. Update the connection information cached in pool. >>> >>> SELECT pgxc_pool_reload(); >>> >>> COORD_2 is now removed from the cluster & COORD_1 would work as if COORD_2 never existed. >>> >>> CAUTION : If COORD_2 is still running and clients are connected to it, any queries issued would create inconsistencies in the cluster. >>> >>> Please note that there is no need to block DDLs because either way DDLs will fail after step 1 and before step 4. >>> >>> >>> >>> >>> Removing an existing datanode >>> ========================= >>> >>> Assume a two coordinator cluster, COORD_1 & COORD_2 >>> with three datanodes DATA_NODE_1, DATA_NODE_2 & DATA_NODE_3 >>> >>> Suppose we want to remove DATA_NODE_3 for any reason. >>> >>> Further assume there is a table named rr_abc distributed in round robin fashion >>> and has rows on all the three datanodes. >>> >>> 1. Block DMLs so that during step 2, while we are shifting data from >>> the datanode to be removed some one could have an insert process >>> inserting data in the same. >>> >>> Here we will need to add a system function similar to pgxc_lock_for_backup. >>> This is a to do item. >>> >>> 2. Transfer the data from the datanode to be removed to the rest of the datanodes for all the tables in all the databases. >>> For example to shift data of the table rr_abc to the >>> rest of the nodes we can use command >>> >>> ALTER TABLE rr_abc DELETE NODE (DATA_NODE_3); >>> >>> 3. Confirm that there is no data left on the datanode to be removed. >>> For example to confirm that there is no data left on DATA_NODE_3 >>> >>> select c.pcrelid from pgxc_class c, pgxc_node n where >>> n.node_name = 'DATA_NODE_3' and n.oid = ANY (c.nodeoids); >>> >>> 4. Stop the datanode server to be removed. >>> Now any SELECTs that involve the datanode to be removed would start failing >>> and DMLs have already been blocked, so essentially the cluster would work >>> only partially. >>> >>> 5. Connect to any of the coordinators. >>> In our example assuming COORD_1 is running on port 5432, >>> the following command would connect to COORD_1 >>> >>> psql postgres -p 5432 >>> >>> 6. Drop the datanode to be removed. >>> For example to drop datanode DATA_NODE_3 use command >>> >>> DROP NODE DATA_NODE_3; >>> >>> 7. Update the connection information cached in pool. >>> >>> SELECT pgxc_pool_reload(); >>> >>> 8. Repeat steps 5,6 & 7 for all the coordinators in the cluster. >>> >>> 9. UN-Block DMLs >>> >>> DATA_NODE_3 is now removed from the cluster. >>> >>> >>> Comments are welcome. >>> >>> -- >>> Abbas >>> Architect >>> >>> Ph: 92.334.5100153 >>> Skype ID: gabbasb >>> www.enterprisedb.com >>> >>> Follow us on Twitter >>> @EnterpriseDB >>> >>> Visit EnterpriseDB for tutorials, webinars, whitepapers and more >>> ------------------------------------------------------------------------------ >>> Precog is a next-generation analytics platform capable of advanced >>> analytics on semi-structured data. The platform includes APIs for building >>> apps and a phenomenal toolset for data science. Developers can use >>> our toolset for easy data analysis & visualization. Get a free account! >>> http://www2.precog.com/precogplatform/slashdotnewsletter_______________________________________________ >>> Postgres-xc-developers mailing list >>> Pos...@li... >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >> >> >> >> -- >> -- >> Abbas >> Architect >> >> Ph: 92.334.5100153 >> Skype ID: gabbasb >> www.enterprisedb.com >> >> Follow us on Twitter >> @EnterpriseDB >> >> Visit EnterpriseDB for tutorials, webinars, whitepapers and more > > > > > -- > -- > Abbas > Architect > > Ph: 92.334.5100153 > Skype ID: gabbasb > www.enterprisedb.com > > Follow us on Twitter > @EnterpriseDB > > Visit EnterpriseDB for tutorials, webinars, whitepapers and more |
From: Abbas B. <abb...@en...> - 2013-04-22 20:30:19
|
On Mon, Apr 22, 2013 at 2:30 PM, 鈴木 幸市 <ko...@in...> wrote: > In this case, after ALTER TABLE REDISTRIBUTE is issued against tab1, yes, > DML to tab2 may target to the datanode being removed. However, in this > case, because tab1 has been redistributed, no DML to tab1 will target to > tab1. > > After then when ALTER TABLE REDISTRIBUTE is issued against tab2, all the > data are redistributed and after then no DML to tab2 will be targetted to > the datanode to remove. > > We have discussed this issue about a year ago if we should exclude and > redistribute all the tables before removing datanode and concluded that > this should be DBA's responsibility to exclude the datanode to be removed > from all the distribution, manually or automatic. > > As much as DBA issues ALTER TABLE correctly to all the tables, there's no > chance to bring XC status into inconsistent status so I don't think we need > DML blocking. > > Any other inputs? > I was not taking into account the fact that after tab1 has been redistributed any DMLs will not target the removed node. I agree we do not need any DML blocking. I will commit the updated steps in the repository. > --- > Koichi Suzuki > > > > On 2013/04/22, at 16:42, Abbas Butt <abb...@en...> wrote: > > Consider this case: > Assume two tables in a database: tab1 and tab2, assume both are > distributed by round robin. > Assume a client C1 is connected to the cluster and is running a loop to > insert rows to tab1. > Assume administrator connects to the cluster and issues ALTER TABLE > REDISTRIBUTE for table tab2. The moment this alter finishes, assume C1 > starts inserting rows to tab2, while administrator issues ALTER TABLE > REDISTRIBUTE for table tab1, thinking tab2 is clear. > Since the administrator has to issue ALTER TABLE REDISTRIBUTE table by > table for all tables in all databases, client C1 would always have a chance > to insert more rows in the table for which administrator has already issued > ALTER TABLE REDISTRIBUTE. > > For this reason we need DML blocking. > > Comments/Suggestions are welcome. > > On Mon, Apr 22, 2013 at 11:15 AM, 鈴木 幸市 <ko...@in...> wrote: > >> Sorry Abbas, I have a question/comment on removing a datanode. >> >> Before DBA would like to remove a datanode, he/she must run ALTER TABLE >> to save all the data in the node to others. Therefore, I'm not sure if we >> need another means to lock DML. >> >> I understand TMP table will be an issue. If a TMP table is created over >> multiple nodes, it is under 2PC control which is not allowed so far. So >> what we can do is to create TMP table on a particular node only. If it is >> the removing datanode, operation will fail and I think it can keep whole >> cluster in a consistent status. >> >> Yes, it is all DBA's responsibility to make sure that no data are left in >> the datanode. We have a means to make it sure as you submitted. >> >> So again, I'm not yet sure if DML blocking is still needed. >> >> Regards; >> --- >> Koichi Suzuki >> >> >> >> On 2013/04/20, at 5:22, Abbas Butt <abb...@en...> wrote: >> >> Hi, >> >> Here are the proposed steps to remove a node from the cluster. >> >> Removing an existing coordinator >> ========================== >> >> Assume a two coordinator cluster, COORD_1 & COORD_2 >> Suppose we want to remove COORD2 for any reason. >> >> 1. Stop the coordinator to be removed. >> In our example we need to stop COORD_2. >> >> 2. Connect to any of the coordinators except the one to be removed. >> In our example assuming COORD_1 is running on port 5432, >> the following command would connect to COORD_1 >> >> psql postgres -p 5432 >> >> 3. Drop the coordinator to be removed. >> For example to drop coordinator COORD_2 >> >> DROP NODE COORD_2; >> >> 4. Update the connection information cached in pool. >> >> SELECT pgxc_pool_reload(); >> >> COORD_2 is now removed from the cluster & COORD_1 would work as if >> COORD_2 never existed. >> >> CAUTION : If COORD_2 is still running and clients are connected to it, >> any queries issued would create inconsistencies in the cluster. >> >> Please note that there is no need to block DDLs because either way DDLs >> will fail after step 1 and before step 4. >> >> >> >> >> Removing an existing datanode >> ========================= >> >> Assume a two coordinator cluster, COORD_1 & COORD_2 >> with three datanodes DATA_NODE_1, DATA_NODE_2 & DATA_NODE_3 >> >> Suppose we want to remove DATA_NODE_3 for any reason. >> >> Further assume there is a table named rr_abc distributed in round robin >> fashion >> and has rows on all the three datanodes. >> >> 1. Block DMLs so that during step 2, while we are shifting data from >> the datanode to be removed some one could have an insert process >> inserting data in the same. >> >> Here we will need to add a system function similar to >> pgxc_lock_for_backup. >> This is a to do item. >> >> 2. Transfer the data from the datanode to be removed to the rest of the >> datanodes for all the tables in all the databases. >> For example to shift data of the table rr_abc to the >> rest of the nodes we can use command >> >> ALTER TABLE rr_abc DELETE NODE (DATA_NODE_3); >> >> 3. Confirm that there is no data left on the datanode to be removed. >> For example to confirm that there is no data left on DATA_NODE_3 >> >> select c.pcrelid from pgxc_class c, pgxc_node n where >> n.node_name = 'DATA_NODE_3' and n.oid = ANY (c.nodeoids); >> >> 4. Stop the datanode server to be removed. >> Now any SELECTs that involve the datanode to be removed would start >> failing >> and DMLs have already been blocked, so essentially the cluster would >> work >> only partially. >> >> 5. Connect to any of the coordinators. >> In our example assuming COORD_1 is running on port 5432, >> the following command would connect to COORD_1 >> >> psql postgres -p 5432 >> >> 6. Drop the datanode to be removed. >> For example to drop datanode DATA_NODE_3 use command >> >> DROP NODE DATA_NODE_3; >> >> 7. Update the connection information cached in pool. >> >> SELECT pgxc_pool_reload(); >> >> 8. Repeat steps 5,6 & 7 for all the coordinators in the cluster. >> >> 9. UN-Block DMLs >> >> DATA_NODE_3 is now removed from the cluster. >> >> >> Comments are welcome. >> >> -- >> *Abbas* >> Architect >> >> Ph: 92.334.5100153 >> Skype ID: gabbasb >> www.enterprisedb.co <http://www.enterprisedb.com/>m<http://www.enterprisedb.com/> >> * >> Follow us on Twitter* >> @EnterpriseDB >> >> Visit EnterpriseDB for tutorials, webinars, whitepapers<http://www.enterprisedb.com/resources-community>and more<http://www.enterprisedb.com/resources-community> >> ------------------------------------------------------------------------------ >> Precog is a next-generation analytics platform capable of advanced >> analytics on semi-structured data. The platform includes APIs for building >> apps and a phenomenal toolset for data science. Developers can use >> our toolset for easy data analysis & visualization. Get a free account! >> >> http://www2.precog.com/precogplatform/slashdotnewsletter_______________________________________________ >> Postgres-xc-developers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >> >> > > > -- > -- > *Abbas* > Architect > > Ph: 92.334.5100153 > Skype ID: gabbasb > www.enterprisedb.co <http://www.enterprisedb.com/>m<http://www.enterprisedb.com/> > * > Follow us on Twitter* > @EnterpriseDB > > Visit EnterpriseDB for tutorials, webinars, whitepapers<http://www.enterprisedb.com/resources-community>and more<http://www.enterprisedb.com/resources-community> > > > -- -- *Abbas* Architect Ph: 92.334.5100153 Skype ID: gabbasb www.enterprisedb.co <http://www.enterprisedb.com/>m<http://www.enterprisedb.com/> * Follow us on Twitter* @EnterpriseDB Visit EnterpriseDB for tutorials, webinars, whitepapers<http://www.enterprisedb.com/resources-community>and more<http://www.enterprisedb.com/resources-community> |
From: Koichi S. <koi...@gm...> - 2013-04-22 10:04:26
|
Now we're entering the phase to prepare next major release. Code for the major features are about to be out so I'd like to share how to build a new branch for PGXC-1.1. I should be a bit careful because we've made serious mistake to merge our master with PG 9.2.3 branch. Here's a step I suppose to do: 1. Wait for the latest commit for trigger and pgxc_ctl (maybe tomorrow), 2. Merge with PG master just before PG cut REL9_2_STABLE, 3. Fix the regression and bugs, 4. Cut REL1_1_STABLE branch, 5. Merge 9.2.4 into REL1_1_STABLE branch, 6. Fix the regression and bugs, 7. (If possible) PG master into XC master. Any comments/inputs are welcome. Best; ---------- Koichi Suzuki |
From: 鈴木 幸市 <ko...@in...> - 2013-04-22 09:30:21
|
In this case, after ALTER TABLE REDISTRIBUTE is issued against tab1, yes, DML to tab2 may target to the datanode being removed. However, in this case, because tab1 has been redistributed, no DML to tab1 will target to tab1. After then when ALTER TABLE REDISTRIBUTE is issued against tab2, all the data are redistributed and after then no DML to tab2 will be targetted to the datanode to remove. We have discussed this issue about a year ago if we should exclude and redistribute all the tables before removing datanode and concluded that this should be DBA's responsibility to exclude the datanode to be removed from all the distribution, manually or automatic. As much as DBA issues ALTER TABLE correctly to all the tables, there's no chance to bring XC status into inconsistent status so I don't think we need DML blocking. Any other inputs? --- Koichi Suzuki On 2013/04/22, at 16:42, Abbas Butt <abb...@en...> wrote: > Consider this case: > Assume two tables in a database: tab1 and tab2, assume both are distributed by round robin. > Assume a client C1 is connected to the cluster and is running a loop to insert rows to tab1. > Assume administrator connects to the cluster and issues ALTER TABLE REDISTRIBUTE for table tab2. The moment this alter finishes, assume C1 starts inserting rows to tab2, while administrator issues ALTER TABLE REDISTRIBUTE for table tab1, thinking tab2 is clear. > Since the administrator has to issue ALTER TABLE REDISTRIBUTE table by table for all tables in all databases, client C1 would always have a chance to insert more rows in the table for which administrator has already issued ALTER TABLE REDISTRIBUTE. > > For this reason we need DML blocking. > > Comments/Suggestions are welcome. > > On Mon, Apr 22, 2013 at 11:15 AM, 鈴木 幸市 <ko...@in...> wrote: > Sorry Abbas, I have a question/comment on removing a datanode. > > Before DBA would like to remove a datanode, he/she must run ALTER TABLE to save all the data in the node to others. Therefore, I'm not sure if we need another means to lock DML. > > I understand TMP table will be an issue. If a TMP table is created over multiple nodes, it is under 2PC control which is not allowed so far. So what we can do is to create TMP table on a particular node only. If it is the removing datanode, operation will fail and I think it can keep whole cluster in a consistent status. > > Yes, it is all DBA's responsibility to make sure that no data are left in the datanode. We have a means to make it sure as you submitted. > > So again, I'm not yet sure if DML blocking is still needed. > > Regards; > --- > Koichi Suzuki > > > > On 2013/04/20, at 5:22, Abbas Butt <abb...@en...> wrote: > >> Hi, >> >> Here are the proposed steps to remove a node from the cluster. >> >> Removing an existing coordinator >> ========================== >> >> Assume a two coordinator cluster, COORD_1 & COORD_2 >> Suppose we want to remove COORD2 for any reason. >> >> 1. Stop the coordinator to be removed. >> In our example we need to stop COORD_2. >> >> 2. Connect to any of the coordinators except the one to be removed. >> In our example assuming COORD_1 is running on port 5432, >> the following command would connect to COORD_1 >> >> psql postgres -p 5432 >> >> 3. Drop the coordinator to be removed. >> For example to drop coordinator COORD_2 >> >> DROP NODE COORD_2; >> >> 4. Update the connection information cached in pool. >> >> SELECT pgxc_pool_reload(); >> >> COORD_2 is now removed from the cluster & COORD_1 would work as if COORD_2 never existed. >> >> CAUTION : If COORD_2 is still running and clients are connected to it, any queries issued would create inconsistencies in the cluster. >> >> Please note that there is no need to block DDLs because either way DDLs will fail after step 1 and before step 4. >> >> >> >> >> Removing an existing datanode >> ========================= >> >> Assume a two coordinator cluster, COORD_1 & COORD_2 >> with three datanodes DATA_NODE_1, DATA_NODE_2 & DATA_NODE_3 >> >> Suppose we want to remove DATA_NODE_3 for any reason. >> >> Further assume there is a table named rr_abc distributed in round robin fashion >> and has rows on all the three datanodes. >> >> 1. Block DMLs so that during step 2, while we are shifting data from >> the datanode to be removed some one could have an insert process >> inserting data in the same. >> >> Here we will need to add a system function similar to pgxc_lock_for_backup. >> This is a to do item. >> >> 2. Transfer the data from the datanode to be removed to the rest of the datanodes for all the tables in all the databases. >> For example to shift data of the table rr_abc to the >> rest of the nodes we can use command >> >> ALTER TABLE rr_abc DELETE NODE (DATA_NODE_3); >> >> 3. Confirm that there is no data left on the datanode to be removed. >> For example to confirm that there is no data left on DATA_NODE_3 >> >> select c.pcrelid from pgxc_class c, pgxc_node n where >> n.node_name = 'DATA_NODE_3' and n.oid = ANY (c.nodeoids); >> >> 4. Stop the datanode server to be removed. >> Now any SELECTs that involve the datanode to be removed would start failing >> and DMLs have already been blocked, so essentially the cluster would work >> only partially. >> >> 5. Connect to any of the coordinators. >> In our example assuming COORD_1 is running on port 5432, >> the following command would connect to COORD_1 >> >> psql postgres -p 5432 >> >> 6. Drop the datanode to be removed. >> For example to drop datanode DATA_NODE_3 use command >> >> DROP NODE DATA_NODE_3; >> >> 7. Update the connection information cached in pool. >> >> SELECT pgxc_pool_reload(); >> >> 8. Repeat steps 5,6 & 7 for all the coordinators in the cluster. >> >> 9. UN-Block DMLs >> >> DATA_NODE_3 is now removed from the cluster. >> >> >> Comments are welcome. >> >> -- >> Abbas >> Architect >> >> Ph: 92.334.5100153 >> Skype ID: gabbasb >> www.enterprisedb.com >> >> Follow us on Twitter >> @EnterpriseDB >> >> Visit EnterpriseDB for tutorials, webinars, whitepapers and more >> ------------------------------------------------------------------------------ >> Precog is a next-generation analytics platform capable of advanced >> analytics on semi-structured data. The platform includes APIs for building >> apps and a phenomenal toolset for data science. Developers can use >> our toolset for easy data analysis & visualization. Get a free account! >> http://www2.precog.com/precogplatform/slashdotnewsletter_______________________________________________ >> Postgres-xc-developers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > > > > -- > -- > Abbas > Architect > > Ph: 92.334.5100153 > Skype ID: gabbasb > www.enterprisedb.com > > Follow us on Twitter > @EnterpriseDB > > Visit EnterpriseDB for tutorials, webinars, whitepapers and more |
From: Nikhil S. <ni...@st...> - 2013-04-22 08:19:28
|
The currval() logic also has a similar issue. Regards, Nikhils On Mon, Apr 22, 2013 at 6:18 AM, 鈴木 幸市 <ko...@in...> wrote: > I'm afraid the cause is not only by gym. If min_value is not specified > (or specified as "invalid" value), gtm assigns -0x7ffffffffffffffeLL as the > minimum value of the sequence (see gtm_seq.[ch]). So I suspect that > minimum value will be specified by DDL in the coordinator. > > You fix is very welcome. > > GTM in the current master has a backup feature which dumps next restore > point of gxid and each sequence to gtm.control. I hope this helps tests > to some extent. > > Regards; > --- > Koichi Suzuki > > > > On 2013/04/22, at 8:32, Michael Paquier <mic...@gm...> wrote: > > > > > On Mon, Apr 22, 2013 at 3:57 AM, Nikhil Sontakke <ni...@st...>wrote: > >> Hi all, >> >> Consider the following: >> >> create sequence seqname increment by -2; >> select nextval('seqname'); >> ERROR: GTM error, could not obtain sequence value >> >> The above should have returned -1. >> >> ISTM, the sequence fetching logic on the backend side has been coded >> entirely with the assumption that sequences can never have negative values. >> So a less than 0 value returned from get_next() is assumed to be a problem! >> That's completely wrong. We need to deal with this properly. I will submit >> a patch for this when I get some time soon. >> > Adding a regression test in xc_sequence.sql would also be good to cover > this case as this is caused by xc-only code. > -- > Michael > ------------------------------------------------------------------------------ > Precog is a next-generation analytics platform capable of advanced > analytics on semi-structured data. The platform includes APIs for building > apps and a phenomenal toolset for data science. Developers can use > our toolset for easy data analysis & visualization. Get a free account! > > http://www2.precog.com/precogplatform/slashdotnewsletter_______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > > -- StormDB - http://www.stormdb.com The Database Cloud |
From: Abbas B. <abb...@en...> - 2013-04-22 07:42:24
|
Consider this case: Assume two tables in a database: tab1 and tab2, assume both are distributed by round robin. Assume a client C1 is connected to the cluster and is running a loop to insert rows to tab1. Assume administrator connects to the cluster and issues ALTER TABLE REDISTRIBUTE for table tab2. The moment this alter finishes, assume C1 starts inserting rows to tab2, while administrator issues ALTER TABLE REDISTRIBUTE for table tab1, thinking tab2 is clear. Since the administrator has to issue ALTER TABLE REDISTRIBUTE table by table for all tables in all databases, client C1 would always have a chance to insert more rows in the table for which administrator has already issued ALTER TABLE REDISTRIBUTE. For this reason we need DML blocking. Comments/Suggestions are welcome. On Mon, Apr 22, 2013 at 11:15 AM, 鈴木 幸市 <ko...@in...> wrote: > Sorry Abbas, I have a question/comment on removing a datanode. > > Before DBA would like to remove a datanode, he/she must run ALTER TABLE to > save all the data in the node to others. Therefore, I'm not sure if we > need another means to lock DML. > > I understand TMP table will be an issue. If a TMP table is created over > multiple nodes, it is under 2PC control which is not allowed so far. So > what we can do is to create TMP table on a particular node only. If it is > the removing datanode, operation will fail and I think it can keep whole > cluster in a consistent status. > > Yes, it is all DBA's responsibility to make sure that no data are left in > the datanode. We have a means to make it sure as you submitted. > > So again, I'm not yet sure if DML blocking is still needed. > > Regards; > --- > Koichi Suzuki > > > > On 2013/04/20, at 5:22, Abbas Butt <abb...@en...> wrote: > > Hi, > > Here are the proposed steps to remove a node from the cluster. > > Removing an existing coordinator > ========================== > > Assume a two coordinator cluster, COORD_1 & COORD_2 > Suppose we want to remove COORD2 for any reason. > > 1. Stop the coordinator to be removed. > In our example we need to stop COORD_2. > > 2. Connect to any of the coordinators except the one to be removed. > In our example assuming COORD_1 is running on port 5432, > the following command would connect to COORD_1 > > psql postgres -p 5432 > > 3. Drop the coordinator to be removed. > For example to drop coordinator COORD_2 > > DROP NODE COORD_2; > > 4. Update the connection information cached in pool. > > SELECT pgxc_pool_reload(); > > COORD_2 is now removed from the cluster & COORD_1 would work as if COORD_2 > never existed. > > CAUTION : If COORD_2 is still running and clients are connected to it, > any queries issued would create inconsistencies in the cluster. > > Please note that there is no need to block DDLs because either way DDLs > will fail after step 1 and before step 4. > > > > > Removing an existing datanode > ========================= > > Assume a two coordinator cluster, COORD_1 & COORD_2 > with three datanodes DATA_NODE_1, DATA_NODE_2 & DATA_NODE_3 > > Suppose we want to remove DATA_NODE_3 for any reason. > > Further assume there is a table named rr_abc distributed in round robin > fashion > and has rows on all the three datanodes. > > 1. Block DMLs so that during step 2, while we are shifting data from > the datanode to be removed some one could have an insert process > inserting data in the same. > > Here we will need to add a system function similar to > pgxc_lock_for_backup. > This is a to do item. > > 2. Transfer the data from the datanode to be removed to the rest of the > datanodes for all the tables in all the databases. > For example to shift data of the table rr_abc to the > rest of the nodes we can use command > > ALTER TABLE rr_abc DELETE NODE (DATA_NODE_3); > > 3. Confirm that there is no data left on the datanode to be removed. > For example to confirm that there is no data left on DATA_NODE_3 > > select c.pcrelid from pgxc_class c, pgxc_node n where > n.node_name = 'DATA_NODE_3' and n.oid = ANY (c.nodeoids); > > 4. Stop the datanode server to be removed. > Now any SELECTs that involve the datanode to be removed would start > failing > and DMLs have already been blocked, so essentially the cluster would > work > only partially. > > 5. Connect to any of the coordinators. > In our example assuming COORD_1 is running on port 5432, > the following command would connect to COORD_1 > > psql postgres -p 5432 > > 6. Drop the datanode to be removed. > For example to drop datanode DATA_NODE_3 use command > > DROP NODE DATA_NODE_3; > > 7. Update the connection information cached in pool. > > SELECT pgxc_pool_reload(); > > 8. Repeat steps 5,6 & 7 for all the coordinators in the cluster. > > 9. UN-Block DMLs > > DATA_NODE_3 is now removed from the cluster. > > > Comments are welcome. > > -- > *Abbas* > Architect > > Ph: 92.334.5100153 > Skype ID: gabbasb > www.enterprisedb.co <http://www.enterprisedb.com/>m<http://www.enterprisedb.com/> > * > Follow us on Twitter* > @EnterpriseDB > > Visit EnterpriseDB for tutorials, webinars, whitepapers<http://www.enterprisedb.com/resources-community>and more<http://www.enterprisedb.com/resources-community> > ------------------------------------------------------------------------------ > Precog is a next-generation analytics platform capable of advanced > analytics on semi-structured data. The platform includes APIs for building > apps and a phenomenal toolset for data science. Developers can use > our toolset for easy data analysis & visualization. Get a free account! > > http://www2.precog.com/precogplatform/slashdotnewsletter_______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > > -- -- *Abbas* Architect Ph: 92.334.5100153 Skype ID: gabbasb www.enterprisedb.co <http://www.enterprisedb.com/>m<http://www.enterprisedb.com/> * Follow us on Twitter* @EnterpriseDB Visit EnterpriseDB for tutorials, webinars, whitepapers<http://www.enterprisedb.com/resources-community>and more<http://www.enterprisedb.com/resources-community> |
From: 鈴木 幸市 <ko...@in...> - 2013-04-22 06:16:04
|
Sorry Abbas, I have a question/comment on removing a datanode. Before DBA would like to remove a datanode, he/she must run ALTER TABLE to save all the data in the node to others. Therefore, I'm not sure if we need another means to lock DML. I understand TMP table will be an issue. If a TMP table is created over multiple nodes, it is under 2PC control which is not allowed so far. So what we can do is to create TMP table on a particular node only. If it is the removing datanode, operation will fail and I think it can keep whole cluster in a consistent status. Yes, it is all DBA's responsibility to make sure that no data are left in the datanode. We have a means to make it sure as you submitted. So again, I'm not yet sure if DML blocking is still needed. Regards; --- Koichi Suzuki On 2013/04/20, at 5:22, Abbas Butt <abb...@en...> wrote: > Hi, > > Here are the proposed steps to remove a node from the cluster. > > Removing an existing coordinator > ========================== > > Assume a two coordinator cluster, COORD_1 & COORD_2 > Suppose we want to remove COORD2 for any reason. > > 1. Stop the coordinator to be removed. > In our example we need to stop COORD_2. > > 2. Connect to any of the coordinators except the one to be removed. > In our example assuming COORD_1 is running on port 5432, > the following command would connect to COORD_1 > > psql postgres -p 5432 > > 3. Drop the coordinator to be removed. > For example to drop coordinator COORD_2 > > DROP NODE COORD_2; > > 4. Update the connection information cached in pool. > > SELECT pgxc_pool_reload(); > > COORD_2 is now removed from the cluster & COORD_1 would work as if COORD_2 never existed. > > CAUTION : If COORD_2 is still running and clients are connected to it, any queries issued would create inconsistencies in the cluster. > > Please note that there is no need to block DDLs because either way DDLs will fail after step 1 and before step 4. > > > > > Removing an existing datanode > ========================= > > Assume a two coordinator cluster, COORD_1 & COORD_2 > with three datanodes DATA_NODE_1, DATA_NODE_2 & DATA_NODE_3 > > Suppose we want to remove DATA_NODE_3 for any reason. > > Further assume there is a table named rr_abc distributed in round robin fashion > and has rows on all the three datanodes. > > 1. Block DMLs so that during step 2, while we are shifting data from > the datanode to be removed some one could have an insert process > inserting data in the same. > > Here we will need to add a system function similar to pgxc_lock_for_backup. > This is a to do item. > > 2. Transfer the data from the datanode to be removed to the rest of the datanodes for all the tables in all the databases. > For example to shift data of the table rr_abc to the > rest of the nodes we can use command > > ALTER TABLE rr_abc DELETE NODE (DATA_NODE_3); > > 3. Confirm that there is no data left on the datanode to be removed. > For example to confirm that there is no data left on DATA_NODE_3 > > select c.pcrelid from pgxc_class c, pgxc_node n where > n.node_name = 'DATA_NODE_3' and n.oid = ANY (c.nodeoids); > > 4. Stop the datanode server to be removed. > Now any SELECTs that involve the datanode to be removed would start failing > and DMLs have already been blocked, so essentially the cluster would work > only partially. > > 5. Connect to any of the coordinators. > In our example assuming COORD_1 is running on port 5432, > the following command would connect to COORD_1 > > psql postgres -p 5432 > > 6. Drop the datanode to be removed. > For example to drop datanode DATA_NODE_3 use command > > DROP NODE DATA_NODE_3; > > 7. Update the connection information cached in pool. > > SELECT pgxc_pool_reload(); > > 8. Repeat steps 5,6 & 7 for all the coordinators in the cluster. > > 9. UN-Block DMLs > > DATA_NODE_3 is now removed from the cluster. > > > Comments are welcome. > > -- > Abbas > Architect > > Ph: 92.334.5100153 > Skype ID: gabbasb > www.enterprisedb.com > > Follow us on Twitter > @EnterpriseDB > > Visit EnterpriseDB for tutorials, webinars, whitepapers and more > ------------------------------------------------------------------------------ > Precog is a next-generation analytics platform capable of advanced > analytics on semi-structured data. The platform includes APIs for building > apps and a phenomenal toolset for data science. Developers can use > our toolset for easy data analysis & visualization. Get a free account! > http://www2.precog.com/precogplatform/slashdotnewsletter_______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers |
From: 鈴木 幸市 <ko...@in...> - 2013-04-22 00:58:06
|
Thanks; I will include these steps into pgxc_ctl. Regards; --- Koichi Suzuki On 2013/04/20, at 5:22, Abbas Butt <abb...@en...> wrote: > Hi, > > Here are the proposed steps to remove a node from the cluster. > > Removing an existing coordinator > ========================== > > Assume a two coordinator cluster, COORD_1 & COORD_2 > Suppose we want to remove COORD2 for any reason. > > 1. Stop the coordinator to be removed. > In our example we need to stop COORD_2. > > 2. Connect to any of the coordinators except the one to be removed. > In our example assuming COORD_1 is running on port 5432, > the following command would connect to COORD_1 > > psql postgres -p 5432 > > 3. Drop the coordinator to be removed. > For example to drop coordinator COORD_2 > > DROP NODE COORD_2; > > 4. Update the connection information cached in pool. > > SELECT pgxc_pool_reload(); > > COORD_2 is now removed from the cluster & COORD_1 would work as if COORD_2 never existed. > > CAUTION : If COORD_2 is still running and clients are connected to it, any queries issued would create inconsistencies in the cluster. > > Please note that there is no need to block DDLs because either way DDLs will fail after step 1 and before step 4. > > > > > Removing an existing datanode > ========================= > > Assume a two coordinator cluster, COORD_1 & COORD_2 > with three datanodes DATA_NODE_1, DATA_NODE_2 & DATA_NODE_3 > > Suppose we want to remove DATA_NODE_3 for any reason. > > Further assume there is a table named rr_abc distributed in round robin fashion > and has rows on all the three datanodes. > > 1. Block DMLs so that during step 2, while we are shifting data from > the datanode to be removed some one could have an insert process > inserting data in the same. > > Here we will need to add a system function similar to pgxc_lock_for_backup. > This is a to do item. > > 2. Transfer the data from the datanode to be removed to the rest of the datanodes for all the tables in all the databases. > For example to shift data of the table rr_abc to the > rest of the nodes we can use command > > ALTER TABLE rr_abc DELETE NODE (DATA_NODE_3); > > 3. Confirm that there is no data left on the datanode to be removed. > For example to confirm that there is no data left on DATA_NODE_3 > > select c.pcrelid from pgxc_class c, pgxc_node n where > n.node_name = 'DATA_NODE_3' and n.oid = ANY (c.nodeoids); > > 4. Stop the datanode server to be removed. > Now any SELECTs that involve the datanode to be removed would start failing > and DMLs have already been blocked, so essentially the cluster would work > only partially. > > 5. Connect to any of the coordinators. > In our example assuming COORD_1 is running on port 5432, > the following command would connect to COORD_1 > > psql postgres -p 5432 > > 6. Drop the datanode to be removed. > For example to drop datanode DATA_NODE_3 use command > > DROP NODE DATA_NODE_3; > > 7. Update the connection information cached in pool. > > SELECT pgxc_pool_reload(); > > 8. Repeat steps 5,6 & 7 for all the coordinators in the cluster. > > 9. UN-Block DMLs > > DATA_NODE_3 is now removed from the cluster. > > > Comments are welcome. > > -- > Abbas > Architect > > Ph: 92.334.5100153 > Skype ID: gabbasb > www.enterprisedb.com > > Follow us on Twitter > @EnterpriseDB > > Visit EnterpriseDB for tutorials, webinars, whitepapers and more > ------------------------------------------------------------------------------ > Precog is a next-generation analytics platform capable of advanced > analytics on semi-structured data. The platform includes APIs for building > apps and a phenomenal toolset for data science. Developers can use > our toolset for easy data analysis & visualization. Get a free account! > http://www2.precog.com/precogplatform/slashdotnewsletter_______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers |
From: 鈴木 幸市 <ko...@in...> - 2013-04-22 00:48:40
|
I'm afraid the cause is not only by gym. If min_value is not specified (or specified as "invalid" value), gtm assigns -0x7ffffffffffffffeLL as the minimum value of the sequence (see gtm_seq.[ch]). So I suspect that minimum value will be specified by DDL in the coordinator. You fix is very welcome. GTM in the current master has a backup feature which dumps next restore point of gxid and each sequence to gtm.control. I hope this helps tests to some extent. Regards; --- Koichi Suzuki On 2013/04/22, at 8:32, Michael Paquier <mic...@gm...> wrote: > > > > On Mon, Apr 22, 2013 at 3:57 AM, Nikhil Sontakke <ni...@st...> wrote: > Hi all, > > Consider the following: > > create sequence seqname increment by -2; > select nextval('seqname'); > ERROR: GTM error, could not obtain sequence value > > The above should have returned -1. > > ISTM, the sequence fetching logic on the backend side has been coded entirely with the assumption that sequences can never have negative values. So a less than 0 value returned from get_next() is assumed to be a problem! That's completely wrong. We need to deal with this properly. I will submit a patch for this when I get some time soon. > Adding a regression test in xc_sequence.sql would also be good to cover this case as this is caused by xc-only code. > -- > Michael > ------------------------------------------------------------------------------ > Precog is a next-generation analytics platform capable of advanced > analytics on semi-structured data. The platform includes APIs for building > apps and a phenomenal toolset for data science. Developers can use > our toolset for easy data analysis & visualization. Get a free account! > http://www2.precog.com/precogplatform/slashdotnewsletter_______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers |
From: Michael P. <mic...@gm...> - 2013-04-21 23:32:25
|
On Mon, Apr 22, 2013 at 3:57 AM, Nikhil Sontakke <ni...@st...>wrote: > Hi all, > > Consider the following: > > create sequence seqname increment by -2; > select nextval('seqname'); > ERROR: GTM error, could not obtain sequence value > > The above should have returned -1. > > ISTM, the sequence fetching logic on the backend side has been coded > entirely with the assumption that sequences can never have negative values. > So a less than 0 value returned from get_next() is assumed to be a problem! > That's completely wrong. We need to deal with this properly. I will submit > a patch for this when I get some time soon. > Adding a regression test in xc_sequence.sql would also be good to cover this case as this is caused by xc-only code. -- Michael |
From: Nikhil S. <ni...@st...> - 2013-04-21 18:58:05
|
Hi all, Consider the following: create sequence seqname increment by -2; select nextval('seqname'); ERROR: GTM error, could not obtain sequence value The above should have returned -1. ISTM, the sequence fetching logic on the backend side has been coded entirely with the assumption that sequences can never have negative values. So a less than 0 value returned from get_next() is assumed to be a problem! That's completely wrong. We need to deal with this properly. I will submit a patch for this when I get some time soon. Regards, Nikhils -- StormDB - http://www.stormdb.com The Database Cloud |
From: Abbas B. <abb...@en...> - 2013-04-19 20:22:56
|
Hi, Here are the proposed steps to remove a node from the cluster. Removing an existing coordinator ========================== Assume a two coordinator cluster, COORD_1 & COORD_2 Suppose we want to remove COORD2 for any reason. 1. Stop the coordinator to be removed. In our example we need to stop COORD_2. 2. Connect to any of the coordinators except the one to be removed. In our example assuming COORD_1 is running on port 5432, the following command would connect to COORD_1 psql postgres -p 5432 3. Drop the coordinator to be removed. For example to drop coordinator COORD_2 DROP NODE COORD_2; 4. Update the connection information cached in pool. SELECT pgxc_pool_reload(); COORD_2 is now removed from the cluster & COORD_1 would work as if COORD_2 never existed. CAUTION : If COORD_2 is still running and clients are connected to it, any queries issued would create inconsistencies in the cluster. Please note that there is no need to block DDLs because either way DDLs will fail after step 1 and before step 4. Removing an existing datanode ========================= Assume a two coordinator cluster, COORD_1 & COORD_2 with three datanodes DATA_NODE_1, DATA_NODE_2 & DATA_NODE_3 Suppose we want to remove DATA_NODE_3 for any reason. Further assume there is a table named rr_abc distributed in round robin fashion and has rows on all the three datanodes. 1. Block DMLs so that during step 2, while we are shifting data from the datanode to be removed some one could have an insert process inserting data in the same. Here we will need to add a system function similar to pgxc_lock_for_backup. This is a to do item. 2. Transfer the data from the datanode to be removed to the rest of the datanodes for all the tables in all the databases. For example to shift data of the table rr_abc to the rest of the nodes we can use command ALTER TABLE rr_abc DELETE NODE (DATA_NODE_3); 3. Confirm that there is no data left on the datanode to be removed. For example to confirm that there is no data left on DATA_NODE_3 select c.pcrelid from pgxc_class c, pgxc_node n where n.node_name = 'DATA_NODE_3' and n.oid = ANY (c.nodeoids); 4. Stop the datanode server to be removed. Now any SELECTs that involve the datanode to be removed would start failing and DMLs have already been blocked, so essentially the cluster would work only partially. 5. Connect to any of the coordinators. In our example assuming COORD_1 is running on port 5432, the following command would connect to COORD_1 psql postgres -p 5432 6. Drop the datanode to be removed. For example to drop datanode DATA_NODE_3 use command DROP NODE DATA_NODE_3; 7. Update the connection information cached in pool. SELECT pgxc_pool_reload(); 8. Repeat steps 5,6 & 7 for all the coordinators in the cluster. 9. UN-Block DMLs DATA_NODE_3 is now removed from the cluster. Comments are welcome. -- *Abbas* Architect Ph: 92.334.5100153 Skype ID: gabbasb www.enterprisedb.co <http://www.enterprisedb.com/>m<http://www.enterprisedb.com/> * Follow us on Twitter* @EnterpriseDB Visit EnterpriseDB for tutorials, webinars, whitepapers<http://www.enterprisedb.com/resources-community>and more<http://www.enterprisedb.com/resources-community> |
From: Amit K. <ami...@en...> - 2013-04-19 11:33:25
|
On 17 April 2013 16:46, Ashutosh Bapat <ash...@en...>wrote: > Hi Amit, > Thanks for completing this tedius work. It's pretty complicated. > > As I understand it right, the patch deals with following things > > For after row triggers, PG stores the fireable triggers as events with > ctid of the row as pointer to the row on which the event should be carried > out. For INSERT and DELETE there is only one ctid viz. new or old resp. For > UPDATE, ctids of both the new and old row are stored. For some reason (I am > not clear about the reasons) after row triggers are fired after queueing > the events and thus we need some storage on coordinator to store the > affected tuples, so that they can be retrieved while firing the triggers. > We do not save the entire tuple in the trigger event, to save memory if > there are multiple events that need the same tuple. In PG, ctid of the row > suffices to fetch the row from the heap, which acts as the storage itself. > In XC, however, we need some storage to store the tuples to be fed to > trigger events and need a pointer for each row stored. This pointer will be > saved in the trigger event, and will be used to fetch the row. Your patch > uses two tuplestores to store old and new rows resp. For Update we will use > both the tuplestores, but for INSERT we will use only one of them. > > Here are my comments > 1. As I understand, the tuplestore has a different kind of pointer than > ctid and thus you have created a union in Trigger event structure. Can we > use hash based storage instead of tuplestore? The hash based storage will > have advantages like a. existing ctid, nodeoid combination can be used as > the key in hash store, thus not requiring any union (but will need to add > an OID member). The same ItemPointer structure can be then used, instead of > creating prototypes for XC. b. Hash is ideally a random access storage > unlike tuplestore, which needs some kind of sequential access. c. At places > there is code to first get a pointer in Tuplestore before actually adding > the row, which complicates the code. Hash storage will not have this > problem since the key is independent of the position in hash storage. > > 2. Using two separate tuplestores for new and old tuples is waste of > memory. A tuplestore allocates 1K memory by default, thus having two tuple > stores requires double the amount of memory. If in worst case, the > tuplestore overflows to disk, we will have two files created on file > system, causing random sequential writes on disk, which will affect the > performance. This will mean that the same row pointer can not be used for > OLD and NEW, but that should be fine, as PG itself doesn't need that > condition. > > 3. The tuple store management code is too much tied with the static > structures in trigger.c. We need to isolate this code in a separate file, > so that this approach can be used for other features like constraints if > required. Please separate this code into a separate file with well defined > interfaces like function to add a row to storage, get its pointer, fetch > the row from storage, delete the row from storage (?), destroy the storage > etc. and use them for trigger functionality. In the same file, we need a > prologue describing the need of these interfaces and description of the > interfaces itself. In fact, if this infrastructure is also needed in PG, we > should put it in PG. > > 4. While using two tuplestores we have hardcoded the tuplestore indices as > 0 and 1. Instead of that can we use some macros OR even better use > different variables for both of them. Same goes for all 2 sized arrays that > are defined for the same purpose. > > 5. Please look at the current trigger regression tests. If they do not > cover all the possible test scenarios please add them in the regression. > Testing all the scenarios (various combinations of type triggers, DMLs) is > critical here. > > If you find that the current implementation is working fine, all the above > points can be taken up later after the 1.1 release. The testing can be take > up between beta and GA, and others can be taken up in next release. But > it's important to at least study these approaches. > Thanks Ashutosh for the valuable comments and for your patience in reviewing this work. I will come back to the above points once I commit the trigger support. > Some specific comments > 1. In function pgxc_ar_init_rowstore(), we have used palloc0 + memcpy + > pfree() instead of repalloc + memzero new entries. Repalloc allows to > extend the existing memory allocation without moving the contents (if > possible) and has advantage that it wouldn't fail if sum of allocated > memory and required memory is greater than available memory but required > memory is less than the available memory. So, it's always advantageous to > use Repalloc. Why haven't we used repalloc here? > > I had thought palloc0 + pfree would be simpler, but repalloc code turned out to be as simple. I have changed the code to use repalloc. Thanks. > 2. Can we extend pgxc_ar_goto_end(), to be goto_anywhere function, where > end is a special position. E.g pgxc_ar_goto(ARTupInfo, Pos), where Pos can > be a valid index OR a special position END. pgxc_ar_goto(ARTupInfo, END) > would act as pgxc_ar_goto_end() and pgxc_ar_goto(ARTupInfo. Pos != END) > would replace the tuplestore advance loop in pgxc_ar_dofetch(). The > function may accept a flag backwards if that is required. > So suppose in pgxc_ar_dofetch() I extract first part of scanning the tuplestore upto the fetchpos into a new function pgxc_ar_goto(). And then continue the actual fetch part in the same function pgxc_ar_dofetch(). Then in pgxc_ar_goto() we would have to special-case this goto_end() functionality. The usage of advance_by counter to get to the required position cannot be used. Because for going to the end, we need tuplestore_eof() condition and that condition cannot be mixed up in that code. Also, the advance-by related code has to be skipped. So, since anyways we have two different codes for two scenarios, I think it is better to keep a different function pgxc_ar_gotoend() for the scan-upto-end scenario. > > On Mon, Apr 15, 2013 at 9:54 AM, Amit Khandekar < > ami...@en...> wrote: > >> >>> On Fri, Apr 5, 2013 at 2:38 PM, Amit Khandekar < >>> ami...@en...> wrote: >>> >>>> FYI .. I will use the following document to keep updating the >>>> implementation details for "Saving AR trigger rows in tuplestore" : >>>> >>>> >>>> https://docs.google.com/document/d/158IPS9npmfNsOWPN6ZYgPy91aowTUNP7L7Fl9zBBGqs/edit?usp=sharing >>>> >>> >> Attached is the patch to support after-row triggers. The above doc is >> updated. Yet to analyse the regression tests. The attached test.sql is the >> one I used for unit testing, it is not yet ready to be inserted into >> regression suite. I will be working next on the regression and Ashutosh's >> comments on before-row triggers >> >> Also I haven't yet rebased the rowtriggers branch over the new >> merge-related changes in the master branch. This patch is over the >> rowtriggers branch; I did not push this patch onto the rowtriggers branch >> as well, although I intended to do it, but suspected of some possible >> issues if I push the rowtriggers branch after the recent merge-related >> changes going on in the repository. First I will rebase all the rowtriggers >> branch changes onto the new master branch. >> >> >> >> >> >> >>>> >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Minimize network downtime and maximize team effectiveness. >>>> Reduce network management and security costs.Learn how to hire >>>> the most talented Cisco Certified professionals. Visit the >>>> Employer Resources Portal >>>> http://www.cisco.com/web/learning/employer_resources/index.html >>>> _______________________________________________ >>>> Postgres-xc-developers mailing list >>>> Pos...@li... >>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>> >>>> >>> >>> >>> -- >>> Pavan Deolasee >>> http://www.linkedin.com/in/pavandeolasee >>> >> >> >> >> ------------------------------------------------------------------------------ >> Precog is a next-generation analytics platform capable of advanced >> analytics on semi-structured data. The platform includes APIs for building >> apps and a phenomenal toolset for data science. Developers can use >> our toolset for easy data analysis & visualization. Get a free account! >> http://www2.precog.com/precogplatform/slashdotnewsletter >> _______________________________________________ >> Postgres-xc-developers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >> > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > |
From: Ashutosh B. <ash...@en...> - 2013-04-19 03:42:58
|
Then the perf is expected. There is too much IO on the same machine and too much context switch. On Thu, Apr 18, 2013 at 7:35 PM, Abbas Butt <abb...@en...>wrote: > All instances on the same machine. > > > On Thu, Apr 18, 2013 at 4:38 PM, Ashutosh Bapat < > ash...@en...> wrote: > >> Did you do it on true cluster or by running all instances on same >> machine? The later would degrade the performance. >> >> >> On Thu, Apr 18, 2013 at 4:38 PM, Abbas Butt <abb...@en...>wrote: >> >>> >>> >>> On Thu, Apr 18, 2013 at 8:43 AM, Ashutosh Bapat < >>> ash...@en...> wrote: >>> >>>> Did you measure the performance? >>>> >>> >>> I tried but I was getting very strange numbers , It took some hours but >>> reported >>> >>> Time: 365649.353 ms >>> >>> which comes out to be some 6 minutes, I am not sure why. >>> >>> >>>> >>>> >>>> On Thu, Apr 18, 2013 at 9:02 AM, Abbas Butt < >>>> abb...@en...> wrote: >>>> >>>>> >>>>> >>>>> On Thu, Apr 18, 2013 at 1:07 AM, Abbas Butt < >>>>> abb...@en...> wrote: >>>>> >>>>>> Hi, >>>>>> Here is the review of the patch. >>>>>> >>>>>> Overall the patch is good to go. I have reviewed the code and found >>>>>> some minor errors, which I corrected and have attached the revised patch >>>>>> with the mail. >>>>>> >>>>>> I have tested both the cases when the sort happens in memory and when >>>>>> it happens using disk and found both working. >>>>>> >>>>>> I agree that the approach used in the patch is cleaner and has >>>>>> smaller footprint. >>>>>> >>>>>> I have corrected some white space errors and an unintentional change >>>>>> in function set_dbcleanup_callback >>>>>> git apply /home/edb/Desktop/MergeSort/xc_sort.patch >>>>>> /home/edb/Desktop/MergeSort/xc_sort.patch:539: trailing >>>>>> whitespace. >>>>>> void *fparams; >>>>>> /home/edb/Desktop/MergeSort/xc_sort.patch:1012: trailing >>>>>> whitespace. >>>>>> >>>>>> /home/edb/Desktop/MergeSort/xc_sort.patch:1018: trailing >>>>>> whitespace. >>>>>> >>>>>> /home/edb/Desktop/MergeSort/xc_sort.patch:1087: trailing >>>>>> whitespace. >>>>>> /* >>>>>> /home/edb/Desktop/MergeSort/xc_sort.patch:1228: trailing >>>>>> whitespace. >>>>>> size_t len, Oid msgnode_oid, >>>>>> warning: 5 lines add whitespace errors. >>>>>> >>>>>> I am leaving a query running for tonight which would sort 10M rows of >>>>>> a distributed table and would return top 100 of them. I would report its >>>>>> outcome tomorrow morning. >>>>>> >>>>> >>>>> It worked, here is the test case >>>>> >>>>> 1. create table test1 (id integer primary key , padding text); >>>>> 2. Load 10M rows >>>>> 3. select id from test1 order by 1 limit 100 >>>>> >>>>> >>>>> >>>>> >>>>>> >>>>>> Best Regards >>>>>> >>>>>> >>>>>> On Mon, Apr 1, 2013 at 11:02 AM, Koichi Suzuki < >>>>>> koi...@gm...> wrote: >>>>>> >>>>>>> Thanks. Then 90% improvement means about 53% of the duration, while >>>>>>> 50% means 67% of it. Number of queries in a given duration is 190 vs. >>>>>>> 150, difference is 40. >>>>>>> >>>>>>> Considering the needed resource, it may be okay to begin with >>>>>>> materialization. >>>>>>> >>>>>>> Any other inputs? >>>>>>> ---------- >>>>>>> Koichi Suzuki >>>>>>> >>>>>>> >>>>>>> 2013/4/1 Ashutosh Bapat <ash...@en...> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Apr 1, 2013 at 10:59 AM, Koichi Suzuki < >>>>>>>> koi...@gm...> wrote: >>>>>>>> >>>>>>>>> I understand materialize everything makes code clearer and >>>>>>>>> implementation becomes simpler and better structured. >>>>>>>>> >>>>>>>>> What do you mean by x% improvement? Does 90% improvement mean >>>>>>>>> the total duration is 10% of the original? >>>>>>>>> >>>>>>>> x% improvement means, duration reduces to 100/(100+x) as compared >>>>>>>> to the non-pushdown scenario. Or in simpler words, we see (100+x) queries >>>>>>>> being completed by pushdown approach in the same time in which nonpushdown >>>>>>>> approach completes 100 queries. >>>>>>>> >>>>>>>>> ---------- >>>>>>>>> Koichi Suzuki >>>>>>>>> >>>>>>>>> >>>>>>>>> 2013/3/29 Ashutosh Bapat <ash...@en...> >>>>>>>>> >>>>>>>>>> Hi All, >>>>>>>>>> I measured the scale up for both approaches - a. using datanode >>>>>>>>>> connections as tapes (existing one) b. materialising result on tapes before >>>>>>>>>> merging (the approach I proposed). For 1M rows, 5 coordinators I have found >>>>>>>>>> that approach (a) gives 90% improvement whereas approach (b) gives 50% >>>>>>>>>> improvement. Although the difference is significant, I feel that approach >>>>>>>>>> (b) is much cleaner than approach (a) and doesn't have large footprint >>>>>>>>>> compared to PG code and it takes care of all the cases like 1. >>>>>>>>>> materialising sorted result, 2. takes care of any number of datanode >>>>>>>>>> connections without memory overrun. It's possible to improve it further if >>>>>>>>>> we avoid materialisation of datanode result in tuplestore. >>>>>>>>>> >>>>>>>>>> Patch attached for reference. >>>>>>>>>> >>>>>>>>>> On Tue, Mar 26, 2013 at 10:38 AM, Ashutosh Bapat < >>>>>>>>>> ash...@en...> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Mar 26, 2013 at 10:19 AM, Koichi Suzuki < >>>>>>>>>>> koi...@gm...> wrote: >>>>>>>>>>> >>>>>>>>>>>> On thing we should think for option 1 is: >>>>>>>>>>>> >>>>>>>>>>>> When a number of the result is huge, applications has to wait >>>>>>>>>>>> long >>>>>>>>>>>> time until they get the first row. Because this option may >>>>>>>>>>>> need disk >>>>>>>>>>>> write, total resource consumption will be larger. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> Yes, I am aware of this fact. Please read the next paragraph and >>>>>>>>>>> you will see that the current situation is no better. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> I'm wondering if we can use "cursor" at database so that we can >>>>>>>>>>>> read >>>>>>>>>>>> each tape more simply, I mean, to leave each query node open >>>>>>>>>>>> and read >>>>>>>>>>>> next row from any query node. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> We do that right now. But because of such a simulated cursor >>>>>>>>>>> (it's not cursor per say, but we just fetch the required result from >>>>>>>>>>> connection as the demand arises in merging runs), we observer following >>>>>>>>>>> things >>>>>>>>>>> >>>>>>>>>>> If the plan has multiple remote query nodes (as there will be in >>>>>>>>>>> case of merge join), we assign the same connection to these nodes. Before >>>>>>>>>>> this assignment, the result from the previous connection is materialised at >>>>>>>>>>> the coordinator. This means that, when we will get huge result from the >>>>>>>>>>> datanode, it will be materialised (which will have the more cost as >>>>>>>>>>> materialising it on tape, as this materialisation happens in a linked list, >>>>>>>>>>> which is not optimized). We need to share connection between more than one >>>>>>>>>>> RemoteQuery node because same transaction can not work on two connections >>>>>>>>>>> to same server. Not only performance, but the code has become ugly because >>>>>>>>>>> of this approach. At various places in executor, we have special handling >>>>>>>>>>> for sorting, which needs to be maintained. >>>>>>>>>>> >>>>>>>>>>> Instead if we materialise all the result on tape and then >>>>>>>>>>> proceed with step D5 in Knuth's algorithm for polyphase merge sort, the >>>>>>>>>>> code will be much simpler and we won't loose much performance. In fact, we >>>>>>>>>>> might be able to leverage fetching bulk data on connection which can be >>>>>>>>>>> materialised on tape in bulk. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Regards; >>>>>>>>>>>> ---------- >>>>>>>>>>>> Koichi Suzuki >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 2013/3/25 Ashutosh Bapat <ash...@en...>: >>>>>>>>>>>> > Hi All, >>>>>>>>>>>> > I am working on using remote sorting for merge joins. The >>>>>>>>>>>> idea is while >>>>>>>>>>>> > using merge join at the coordinator, get the data sorted from >>>>>>>>>>>> the datanodes; >>>>>>>>>>>> > for replicated relations, we can get all the rows sorted and >>>>>>>>>>>> for distributed >>>>>>>>>>>> > tables we have to get sorted runs which can be merged at the >>>>>>>>>>>> coordinator. >>>>>>>>>>>> > For merge join the sorted inner relation needs to be randomly >>>>>>>>>>>> accessible. >>>>>>>>>>>> > For replicated relations this can be achieved by >>>>>>>>>>>> materialising the result. >>>>>>>>>>>> > But for distributed relations, we do not materialise the >>>>>>>>>>>> sorted result at >>>>>>>>>>>> > coordinator but compute the sorted result by merging the >>>>>>>>>>>> sorted results from >>>>>>>>>>>> > individual nodes on the fly. For distributed relations, the >>>>>>>>>>>> connection to >>>>>>>>>>>> > the datanodes themselves are used as logical tapes (which >>>>>>>>>>>> provide the sorted >>>>>>>>>>>> > runs). The final result is computed on the fly by choosing >>>>>>>>>>>> the smallest or >>>>>>>>>>>> > greatest row (as required) from the connections. >>>>>>>>>>>> > >>>>>>>>>>>> > For a Sort node the materialised result can reside in memory >>>>>>>>>>>> (if it fits >>>>>>>>>>>> > there) or on one of the logical tapes used for merge sort. >>>>>>>>>>>> So, in order to >>>>>>>>>>>> > provide random access to the sorted result, we need to >>>>>>>>>>>> materialise the >>>>>>>>>>>> > result either in the memory or on the logical tape. In-memory >>>>>>>>>>>> > materialisation is not easily possible since we have already >>>>>>>>>>>> resorted for >>>>>>>>>>>> > tape based sort, in case of distributed relations and to >>>>>>>>>>>> materialise the >>>>>>>>>>>> > result on tape, there is no logical tape available in current >>>>>>>>>>>> algorithm. To >>>>>>>>>>>> > make it work, there are following possible ways >>>>>>>>>>>> > >>>>>>>>>>>> > 1. When random access is required, materialise the sorted >>>>>>>>>>>> runs from >>>>>>>>>>>> > individual nodes onto tapes (one tape for each node) and then >>>>>>>>>>>> merge them on >>>>>>>>>>>> > one extra tape, which can be used for materialisation. >>>>>>>>>>>> > 2. Use a mix of connections and logical tape in the same tape >>>>>>>>>>>> set. Merge the >>>>>>>>>>>> > sorted runs from connections on a logical tape in the same >>>>>>>>>>>> logical tape set. >>>>>>>>>>>> > >>>>>>>>>>>> > While the second one looks attractive from performance >>>>>>>>>>>> perspective (it saves >>>>>>>>>>>> > writing and reading from the tape), it would make the merge >>>>>>>>>>>> code ugly by >>>>>>>>>>>> > using mixed tapes. The read calls for connection and logical >>>>>>>>>>>> tape are >>>>>>>>>>>> > different and we will need both on the logical tape where the >>>>>>>>>>>> final result >>>>>>>>>>>> > is materialized. So, I am thinking of going with 1, in fact, >>>>>>>>>>>> to have same >>>>>>>>>>>> > code to handle remote sort, use 1 in all cases (whether or not >>>>>>>>>>>> > materialization is required). >>>>>>>>>>>> > >>>>>>>>>>>> > Had original authors of remote sort code thought about this >>>>>>>>>>>> materialization? >>>>>>>>>>>> > Anything they can share on this topic? >>>>>>>>>>>> > Any comment? >>>>>>>>>>>> > -- >>>>>>>>>>>> > Best Wishes, >>>>>>>>>>>> > Ashutosh Bapat >>>>>>>>>>>> > EntepriseDB Corporation >>>>>>>>>>>> > The Enterprise Postgres Company >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> ------------------------------------------------------------------------------ >>>>>>>>>>>> > Everyone hates slow websites. So do we. >>>>>>>>>>>> > Make your web apps faster with AppDynamics >>>>>>>>>>>> > Download AppDynamics Lite for free today: >>>>>>>>>>>> > http://p.sf.net/sfu/appdyn_d2d_mar >>>>>>>>>>>> > _______________________________________________ >>>>>>>>>>>> > Postgres-xc-developers mailing list >>>>>>>>>>>> > Pos...@li... >>>>>>>>>>>> > >>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>>>>>>>>> > >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Best Wishes, >>>>>>>>>>> Ashutosh Bapat >>>>>>>>>>> EntepriseDB Corporation >>>>>>>>>>> The Enterprise Postgres Company >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Best Wishes, >>>>>>>>>> Ashutosh Bapat >>>>>>>>>> EntepriseDB Corporation >>>>>>>>>> The Enterprise Postgres Company >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Best Wishes, >>>>>>>> Ashutosh Bapat >>>>>>>> EntepriseDB Corporation >>>>>>>> The Enterprise Postgres Company >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------------------------ >>>>>>> Own the Future-Intel® Level Up Game Demo Contest 2013 >>>>>>> Rise to greatness in Intel's independent game demo contest. >>>>>>> Compete for recognition, cash, and the chance to get your game >>>>>>> on Steam. $5K grand prize plus 10 genre and skill prizes. >>>>>>> Submit your demo by 6/6/13. http://p.sf.net/sfu/intel_levelupd2d >>>>>>> _______________________________________________ >>>>>>> Postgres-xc-developers mailing list >>>>>>> Pos...@li... >>>>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> -- >>>>>> Abbas >>>>>> Architect >>>>>> EnterpriseDB Corporation >>>>>> The Enterprise PostgreSQL Company >>>>>> >>>>>> Phone: 92-334-5100153 >>>>>> >>>>>> Website: www.enterprisedb.com >>>>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>>>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>>>>> >>>>>> This e-mail message (and any attachment) is intended for the use of >>>>>> the individual or entity to whom it is addressed. This message >>>>>> contains information from EnterpriseDB Corporation that may be >>>>>> privileged, confidential, or exempt from disclosure under applicable >>>>>> law. If you are not the intended recipient or authorized to receive >>>>>> this for the intended recipient, any use, dissemination, distribution, >>>>>> retention, archiving, or copying of this communication is strictly >>>>>> prohibited. If you have received this e-mail in error, please notify >>>>>> the sender immediately by reply e-mail and delete this message. >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> -- >>>>> Abbas >>>>> Architect >>>>> EnterpriseDB Corporation >>>>> The Enterprise PostgreSQL Company >>>>> >>>>> Phone: 92-334-5100153 >>>>> >>>>> Website: www.enterprisedb.com >>>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>>>> >>>>> This e-mail message (and any attachment) is intended for the use of >>>>> the individual or entity to whom it is addressed. This message >>>>> contains information from EnterpriseDB Corporation that may be >>>>> privileged, confidential, or exempt from disclosure under applicable >>>>> law. If you are not the intended recipient or authorized to receive >>>>> this for the intended recipient, any use, dissemination, distribution, >>>>> retention, archiving, or copying of this communication is strictly >>>>> prohibited. If you have received this e-mail in error, please notify >>>>> the sender immediately by reply e-mail and delete this message. >>>>> >>>> >>>> >>>> >>>> -- >>>> Best Wishes, >>>> Ashutosh Bapat >>>> EntepriseDB Corporation >>>> The Enterprise Postgres Company >>>> >>> >>> >>> >>> -- >>> -- >>> Abbas >>> Architect >>> EnterpriseDB Corporation >>> The Enterprise PostgreSQL Company >>> >>> Phone: 92-334-5100153 >>> >>> Website: www.enterprisedb.com >>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>> >>> This e-mail message (and any attachment) is intended for the use of >>> the individual or entity to whom it is addressed. This message >>> contains information from EnterpriseDB Corporation that may be >>> privileged, confidential, or exempt from disclosure under applicable >>> law. If you are not the intended recipient or authorized to receive >>> this for the intended recipient, any use, dissemination, distribution, >>> retention, archiving, or copying of this communication is strictly >>> prohibited. If you have received this e-mail in error, please notify >>> the sender immediately by reply e-mail and delete this message. >>> >> >> >> >> -- >> Best Wishes, >> Ashutosh Bapat >> EntepriseDB Corporation >> The Enterprise Postgres Company >> > > > > -- > -- > Abbas > Architect > EnterpriseDB Corporation > The Enterprise PostgreSQL Company > > Phone: 92-334-5100153 > > Website: www.enterprisedb.com > EnterpriseDB Blog: http://blogs.enterprisedb.com/ > Follow us on Twitter: http://www.twitter.com/enterprisedb > > This e-mail message (and any attachment) is intended for the use of > the individual or entity to whom it is addressed. This message > contains information from EnterpriseDB Corporation that may be > privileged, confidential, or exempt from disclosure under applicable > law. If you are not the intended recipient or authorized to receive > this for the intended recipient, any use, dissemination, distribution, > retention, archiving, or copying of this communication is strictly > prohibited. If you have received this e-mail in error, please notify > the sender immediately by reply e-mail and delete this message. > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Abbas B. <abb...@en...> - 2013-04-18 14:05:22
|
All instances on the same machine. On Thu, Apr 18, 2013 at 4:38 PM, Ashutosh Bapat < ash...@en...> wrote: > Did you do it on true cluster or by running all instances on same machine? > The later would degrade the performance. > > > On Thu, Apr 18, 2013 at 4:38 PM, Abbas Butt <abb...@en...>wrote: > >> >> >> On Thu, Apr 18, 2013 at 8:43 AM, Ashutosh Bapat < >> ash...@en...> wrote: >> >>> Did you measure the performance? >>> >> >> I tried but I was getting very strange numbers , It took some hours but >> reported >> >> Time: 365649.353 ms >> >> which comes out to be some 6 minutes, I am not sure why. >> >> >>> >>> >>> On Thu, Apr 18, 2013 at 9:02 AM, Abbas Butt <abb...@en... >>> > wrote: >>> >>>> >>>> >>>> On Thu, Apr 18, 2013 at 1:07 AM, Abbas Butt < >>>> abb...@en...> wrote: >>>> >>>>> Hi, >>>>> Here is the review of the patch. >>>>> >>>>> Overall the patch is good to go. I have reviewed the code and found >>>>> some minor errors, which I corrected and have attached the revised patch >>>>> with the mail. >>>>> >>>>> I have tested both the cases when the sort happens in memory and when >>>>> it happens using disk and found both working. >>>>> >>>>> I agree that the approach used in the patch is cleaner and has smaller >>>>> footprint. >>>>> >>>>> I have corrected some white space errors and an unintentional change >>>>> in function set_dbcleanup_callback >>>>> git apply /home/edb/Desktop/MergeSort/xc_sort.patch >>>>> /home/edb/Desktop/MergeSort/xc_sort.patch:539: trailing whitespace. >>>>> void *fparams; >>>>> /home/edb/Desktop/MergeSort/xc_sort.patch:1012: trailing >>>>> whitespace. >>>>> >>>>> /home/edb/Desktop/MergeSort/xc_sort.patch:1018: trailing >>>>> whitespace. >>>>> >>>>> /home/edb/Desktop/MergeSort/xc_sort.patch:1087: trailing >>>>> whitespace. >>>>> /* >>>>> /home/edb/Desktop/MergeSort/xc_sort.patch:1228: trailing >>>>> whitespace. >>>>> size_t len, Oid msgnode_oid, >>>>> warning: 5 lines add whitespace errors. >>>>> >>>>> I am leaving a query running for tonight which would sort 10M rows of >>>>> a distributed table and would return top 100 of them. I would report its >>>>> outcome tomorrow morning. >>>>> >>>> >>>> It worked, here is the test case >>>> >>>> 1. create table test1 (id integer primary key , padding text); >>>> 2. Load 10M rows >>>> 3. select id from test1 order by 1 limit 100 >>>> >>>> >>>> >>>> >>>>> >>>>> Best Regards >>>>> >>>>> >>>>> On Mon, Apr 1, 2013 at 11:02 AM, Koichi Suzuki < >>>>> koi...@gm...> wrote: >>>>> >>>>>> Thanks. Then 90% improvement means about 53% of the duration, while >>>>>> 50% means 67% of it. Number of queries in a given duration is 190 vs. >>>>>> 150, difference is 40. >>>>>> >>>>>> Considering the needed resource, it may be okay to begin with >>>>>> materialization. >>>>>> >>>>>> Any other inputs? >>>>>> ---------- >>>>>> Koichi Suzuki >>>>>> >>>>>> >>>>>> 2013/4/1 Ashutosh Bapat <ash...@en...> >>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Apr 1, 2013 at 10:59 AM, Koichi Suzuki < >>>>>>> koi...@gm...> wrote: >>>>>>> >>>>>>>> I understand materialize everything makes code clearer and >>>>>>>> implementation becomes simpler and better structured. >>>>>>>> >>>>>>>> What do you mean by x% improvement? Does 90% improvement mean the >>>>>>>> total duration is 10% of the original? >>>>>>>> >>>>>>> x% improvement means, duration reduces to 100/(100+x) as compared to >>>>>>> the non-pushdown scenario. Or in simpler words, we see (100+x) queries >>>>>>> being completed by pushdown approach in the same time in which nonpushdown >>>>>>> approach completes 100 queries. >>>>>>> >>>>>>>> ---------- >>>>>>>> Koichi Suzuki >>>>>>>> >>>>>>>> >>>>>>>> 2013/3/29 Ashutosh Bapat <ash...@en...> >>>>>>>> >>>>>>>>> Hi All, >>>>>>>>> I measured the scale up for both approaches - a. using datanode >>>>>>>>> connections as tapes (existing one) b. materialising result on tapes before >>>>>>>>> merging (the approach I proposed). For 1M rows, 5 coordinators I have found >>>>>>>>> that approach (a) gives 90% improvement whereas approach (b) gives 50% >>>>>>>>> improvement. Although the difference is significant, I feel that approach >>>>>>>>> (b) is much cleaner than approach (a) and doesn't have large footprint >>>>>>>>> compared to PG code and it takes care of all the cases like 1. >>>>>>>>> materialising sorted result, 2. takes care of any number of datanode >>>>>>>>> connections without memory overrun. It's possible to improve it further if >>>>>>>>> we avoid materialisation of datanode result in tuplestore. >>>>>>>>> >>>>>>>>> Patch attached for reference. >>>>>>>>> >>>>>>>>> On Tue, Mar 26, 2013 at 10:38 AM, Ashutosh Bapat < >>>>>>>>> ash...@en...> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Mar 26, 2013 at 10:19 AM, Koichi Suzuki < >>>>>>>>>> koi...@gm...> wrote: >>>>>>>>>> >>>>>>>>>>> On thing we should think for option 1 is: >>>>>>>>>>> >>>>>>>>>>> When a number of the result is huge, applications has to wait >>>>>>>>>>> long >>>>>>>>>>> time until they get the first row. Because this option may need >>>>>>>>>>> disk >>>>>>>>>>> write, total resource consumption will be larger. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> Yes, I am aware of this fact. Please read the next paragraph and >>>>>>>>>> you will see that the current situation is no better. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> I'm wondering if we can use "cursor" at database so that we can >>>>>>>>>>> read >>>>>>>>>>> each tape more simply, I mean, to leave each query node open and >>>>>>>>>>> read >>>>>>>>>>> next row from any query node. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> We do that right now. But because of such a simulated cursor >>>>>>>>>> (it's not cursor per say, but we just fetch the required result from >>>>>>>>>> connection as the demand arises in merging runs), we observer following >>>>>>>>>> things >>>>>>>>>> >>>>>>>>>> If the plan has multiple remote query nodes (as there will be in >>>>>>>>>> case of merge join), we assign the same connection to these nodes. Before >>>>>>>>>> this assignment, the result from the previous connection is materialised at >>>>>>>>>> the coordinator. This means that, when we will get huge result from the >>>>>>>>>> datanode, it will be materialised (which will have the more cost as >>>>>>>>>> materialising it on tape, as this materialisation happens in a linked list, >>>>>>>>>> which is not optimized). We need to share connection between more than one >>>>>>>>>> RemoteQuery node because same transaction can not work on two connections >>>>>>>>>> to same server. Not only performance, but the code has become ugly because >>>>>>>>>> of this approach. At various places in executor, we have special handling >>>>>>>>>> for sorting, which needs to be maintained. >>>>>>>>>> >>>>>>>>>> Instead if we materialise all the result on tape and then proceed >>>>>>>>>> with step D5 in Knuth's algorithm for polyphase merge sort, the code will >>>>>>>>>> be much simpler and we won't loose much performance. In fact, we might be >>>>>>>>>> able to leverage fetching bulk data on connection which can be materialised >>>>>>>>>> on tape in bulk. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Regards; >>>>>>>>>>> ---------- >>>>>>>>>>> Koichi Suzuki >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> 2013/3/25 Ashutosh Bapat <ash...@en...>: >>>>>>>>>>> > Hi All, >>>>>>>>>>> > I am working on using remote sorting for merge joins. The idea >>>>>>>>>>> is while >>>>>>>>>>> > using merge join at the coordinator, get the data sorted from >>>>>>>>>>> the datanodes; >>>>>>>>>>> > for replicated relations, we can get all the rows sorted and >>>>>>>>>>> for distributed >>>>>>>>>>> > tables we have to get sorted runs which can be merged at the >>>>>>>>>>> coordinator. >>>>>>>>>>> > For merge join the sorted inner relation needs to be randomly >>>>>>>>>>> accessible. >>>>>>>>>>> > For replicated relations this can be achieved by materialising >>>>>>>>>>> the result. >>>>>>>>>>> > But for distributed relations, we do not materialise the >>>>>>>>>>> sorted result at >>>>>>>>>>> > coordinator but compute the sorted result by merging the >>>>>>>>>>> sorted results from >>>>>>>>>>> > individual nodes on the fly. For distributed relations, the >>>>>>>>>>> connection to >>>>>>>>>>> > the datanodes themselves are used as logical tapes (which >>>>>>>>>>> provide the sorted >>>>>>>>>>> > runs). The final result is computed on the fly by choosing the >>>>>>>>>>> smallest or >>>>>>>>>>> > greatest row (as required) from the connections. >>>>>>>>>>> > >>>>>>>>>>> > For a Sort node the materialised result can reside in memory >>>>>>>>>>> (if it fits >>>>>>>>>>> > there) or on one of the logical tapes used for merge sort. So, >>>>>>>>>>> in order to >>>>>>>>>>> > provide random access to the sorted result, we need to >>>>>>>>>>> materialise the >>>>>>>>>>> > result either in the memory or on the logical tape. In-memory >>>>>>>>>>> > materialisation is not easily possible since we have already >>>>>>>>>>> resorted for >>>>>>>>>>> > tape based sort, in case of distributed relations and to >>>>>>>>>>> materialise the >>>>>>>>>>> > result on tape, there is no logical tape available in current >>>>>>>>>>> algorithm. To >>>>>>>>>>> > make it work, there are following possible ways >>>>>>>>>>> > >>>>>>>>>>> > 1. When random access is required, materialise the sorted runs >>>>>>>>>>> from >>>>>>>>>>> > individual nodes onto tapes (one tape for each node) and then >>>>>>>>>>> merge them on >>>>>>>>>>> > one extra tape, which can be used for materialisation. >>>>>>>>>>> > 2. Use a mix of connections and logical tape in the same tape >>>>>>>>>>> set. Merge the >>>>>>>>>>> > sorted runs from connections on a logical tape in the same >>>>>>>>>>> logical tape set. >>>>>>>>>>> > >>>>>>>>>>> > While the second one looks attractive from performance >>>>>>>>>>> perspective (it saves >>>>>>>>>>> > writing and reading from the tape), it would make the merge >>>>>>>>>>> code ugly by >>>>>>>>>>> > using mixed tapes. The read calls for connection and logical >>>>>>>>>>> tape are >>>>>>>>>>> > different and we will need both on the logical tape where the >>>>>>>>>>> final result >>>>>>>>>>> > is materialized. So, I am thinking of going with 1, in fact, >>>>>>>>>>> to have same >>>>>>>>>>> > code to handle remote sort, use 1 in all cases (whether or not >>>>>>>>>>> > materialization is required). >>>>>>>>>>> > >>>>>>>>>>> > Had original authors of remote sort code thought about this >>>>>>>>>>> materialization? >>>>>>>>>>> > Anything they can share on this topic? >>>>>>>>>>> > Any comment? >>>>>>>>>>> > -- >>>>>>>>>>> > Best Wishes, >>>>>>>>>>> > Ashutosh Bapat >>>>>>>>>>> > EntepriseDB Corporation >>>>>>>>>>> > The Enterprise Postgres Company >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> ------------------------------------------------------------------------------ >>>>>>>>>>> > Everyone hates slow websites. So do we. >>>>>>>>>>> > Make your web apps faster with AppDynamics >>>>>>>>>>> > Download AppDynamics Lite for free today: >>>>>>>>>>> > http://p.sf.net/sfu/appdyn_d2d_mar >>>>>>>>>>> > _______________________________________________ >>>>>>>>>>> > Postgres-xc-developers mailing list >>>>>>>>>>> > Pos...@li... >>>>>>>>>>> > >>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>>>>>>>> > >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Best Wishes, >>>>>>>>>> Ashutosh Bapat >>>>>>>>>> EntepriseDB Corporation >>>>>>>>>> The Enterprise Postgres Company >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Best Wishes, >>>>>>>>> Ashutosh Bapat >>>>>>>>> EntepriseDB Corporation >>>>>>>>> The Enterprise Postgres Company >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Best Wishes, >>>>>>> Ashutosh Bapat >>>>>>> EntepriseDB Corporation >>>>>>> The Enterprise Postgres Company >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------------ >>>>>> Own the Future-Intel® Level Up Game Demo Contest 2013 >>>>>> Rise to greatness in Intel's independent game demo contest. >>>>>> Compete for recognition, cash, and the chance to get your game >>>>>> on Steam. $5K grand prize plus 10 genre and skill prizes. >>>>>> Submit your demo by 6/6/13. http://p.sf.net/sfu/intel_levelupd2d >>>>>> _______________________________________________ >>>>>> Postgres-xc-developers mailing list >>>>>> Pos...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> -- >>>>> Abbas >>>>> Architect >>>>> EnterpriseDB Corporation >>>>> The Enterprise PostgreSQL Company >>>>> >>>>> Phone: 92-334-5100153 >>>>> >>>>> Website: www.enterprisedb.com >>>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>>>> >>>>> This e-mail message (and any attachment) is intended for the use of >>>>> the individual or entity to whom it is addressed. This message >>>>> contains information from EnterpriseDB Corporation that may be >>>>> privileged, confidential, or exempt from disclosure under applicable >>>>> law. If you are not the intended recipient or authorized to receive >>>>> this for the intended recipient, any use, dissemination, distribution, >>>>> retention, archiving, or copying of this communication is strictly >>>>> prohibited. If you have received this e-mail in error, please notify >>>>> the sender immediately by reply e-mail and delete this message. >>>>> >>>> >>>> >>>> >>>> -- >>>> -- >>>> Abbas >>>> Architect >>>> EnterpriseDB Corporation >>>> The Enterprise PostgreSQL Company >>>> >>>> Phone: 92-334-5100153 >>>> >>>> Website: www.enterprisedb.com >>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>>> >>>> This e-mail message (and any attachment) is intended for the use of >>>> the individual or entity to whom it is addressed. This message >>>> contains information from EnterpriseDB Corporation that may be >>>> privileged, confidential, or exempt from disclosure under applicable >>>> law. If you are not the intended recipient or authorized to receive >>>> this for the intended recipient, any use, dissemination, distribution, >>>> retention, archiving, or copying of this communication is strictly >>>> prohibited. If you have received this e-mail in error, please notify >>>> the sender immediately by reply e-mail and delete this message. >>>> >>> >>> >>> >>> -- >>> Best Wishes, >>> Ashutosh Bapat >>> EntepriseDB Corporation >>> The Enterprise Postgres Company >>> >> >> >> >> -- >> -- >> Abbas >> Architect >> EnterpriseDB Corporation >> The Enterprise PostgreSQL Company >> >> Phone: 92-334-5100153 >> >> Website: www.enterprisedb.com >> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> Follow us on Twitter: http://www.twitter.com/enterprisedb >> >> This e-mail message (and any attachment) is intended for the use of >> the individual or entity to whom it is addressed. This message >> contains information from EnterpriseDB Corporation that may be >> privileged, confidential, or exempt from disclosure under applicable >> law. If you are not the intended recipient or authorized to receive >> this for the intended recipient, any use, dissemination, distribution, >> retention, archiving, or copying of this communication is strictly >> prohibited. If you have received this e-mail in error, please notify >> the sender immediately by reply e-mail and delete this message. >> > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > -- -- Abbas Architect EnterpriseDB Corporation The Enterprise PostgreSQL Company Phone: 92-334-5100153 Website: www.enterprisedb.com EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Amit K. <ami...@en...> - 2013-04-18 13:59:49
|
Commit id 5aa7c108246079b2fee86512e025a2bd1454d87c addresses these points. Details below. On 5 April 2013 16:32, Ashutosh Bapat <ash...@en...>wrote: > Hi Amit, > This isn't your change but, The prologue of function > SetDataRowForIntParams is using non-standard format of adding "----" at the > start and end of prologue. Please remove those. > Done. > > sourceSlot and newSlot seem to be generic names in the context of function > SetDataRowForIntParams() which says "Form a bind row for internal > parameters". Since the function is going to change a bit, it will take some > data from one slot and some from other to create data row, I think we > should change the previous names of variables/functions to convey changed > symantic. > I could not find any better names other than sourceSlot and dataSlot. I have renames newSlot to dataSlot. There is also the context of DML in this function, so in that context these names convey the meaning. > > Please add prologues for function append_val and append_junkval(). These > functions need better names like append_paramval or append_param_junkval > etc., better if you add prefix pgxc_ that we are doing for all xc specific > function. > Done. > > Instead of having macro SET_PARAM_TYPES (which makes debugging difficult), > can you please use function? In fact, if we set parameters just after > setting paramtypes_set? > Done. In another commit : 1539b54bc24c85639ff0cf76042db0098a189e31. That commit was actually for a bug I discovered in binding the parameters. > > I see we have added jf_xc_node_id and jf_whole_row as storage for > attribute numbers of corresponding fields from the source tuple. Please add > some comments specifying their usage (may be why we need these extra fields > apart from jf_junkAttNo) > Done. > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > |
From: Ashutosh B. <ash...@en...> - 2013-04-18 11:38:52
|
Did you do it on true cluster or by running all instances on same machine? The later would degrade the performance. On Thu, Apr 18, 2013 at 4:38 PM, Abbas Butt <abb...@en...>wrote: > > > On Thu, Apr 18, 2013 at 8:43 AM, Ashutosh Bapat < > ash...@en...> wrote: > >> Did you measure the performance? >> > > I tried but I was getting very strange numbers , It took some hours but > reported > > Time: 365649.353 ms > > which comes out to be some 6 minutes, I am not sure why. > > >> >> >> On Thu, Apr 18, 2013 at 9:02 AM, Abbas Butt <abb...@en...>wrote: >> >>> >>> >>> On Thu, Apr 18, 2013 at 1:07 AM, Abbas Butt <abb...@en... >>> > wrote: >>> >>>> Hi, >>>> Here is the review of the patch. >>>> >>>> Overall the patch is good to go. I have reviewed the code and found >>>> some minor errors, which I corrected and have attached the revised patch >>>> with the mail. >>>> >>>> I have tested both the cases when the sort happens in memory and when >>>> it happens using disk and found both working. >>>> >>>> I agree that the approach used in the patch is cleaner and has smaller >>>> footprint. >>>> >>>> I have corrected some white space errors and an unintentional change in >>>> function set_dbcleanup_callback >>>> git apply /home/edb/Desktop/MergeSort/xc_sort.patch >>>> /home/edb/Desktop/MergeSort/xc_sort.patch:539: trailing whitespace. >>>> void *fparams; >>>> /home/edb/Desktop/MergeSort/xc_sort.patch:1012: trailing whitespace. >>>> >>>> /home/edb/Desktop/MergeSort/xc_sort.patch:1018: trailing whitespace. >>>> >>>> /home/edb/Desktop/MergeSort/xc_sort.patch:1087: trailing whitespace. >>>> /* >>>> /home/edb/Desktop/MergeSort/xc_sort.patch:1228: trailing whitespace. >>>> size_t len, Oid msgnode_oid, >>>> warning: 5 lines add whitespace errors. >>>> >>>> I am leaving a query running for tonight which would sort 10M rows of a >>>> distributed table and would return top 100 of them. I would report its >>>> outcome tomorrow morning. >>>> >>> >>> It worked, here is the test case >>> >>> 1. create table test1 (id integer primary key , padding text); >>> 2. Load 10M rows >>> 3. select id from test1 order by 1 limit 100 >>> >>> >>> >>> >>>> >>>> Best Regards >>>> >>>> >>>> On Mon, Apr 1, 2013 at 11:02 AM, Koichi Suzuki < >>>> koi...@gm...> wrote: >>>> >>>>> Thanks. Then 90% improvement means about 53% of the duration, while >>>>> 50% means 67% of it. Number of queries in a given duration is 190 vs. >>>>> 150, difference is 40. >>>>> >>>>> Considering the needed resource, it may be okay to begin with >>>>> materialization. >>>>> >>>>> Any other inputs? >>>>> ---------- >>>>> Koichi Suzuki >>>>> >>>>> >>>>> 2013/4/1 Ashutosh Bapat <ash...@en...> >>>>> >>>>>> >>>>>> >>>>>> On Mon, Apr 1, 2013 at 10:59 AM, Koichi Suzuki < >>>>>> koi...@gm...> wrote: >>>>>> >>>>>>> I understand materialize everything makes code clearer and >>>>>>> implementation becomes simpler and better structured. >>>>>>> >>>>>>> What do you mean by x% improvement? Does 90% improvement mean the >>>>>>> total duration is 10% of the original? >>>>>>> >>>>>> x% improvement means, duration reduces to 100/(100+x) as compared to >>>>>> the non-pushdown scenario. Or in simpler words, we see (100+x) queries >>>>>> being completed by pushdown approach in the same time in which nonpushdown >>>>>> approach completes 100 queries. >>>>>> >>>>>>> ---------- >>>>>>> Koichi Suzuki >>>>>>> >>>>>>> >>>>>>> 2013/3/29 Ashutosh Bapat <ash...@en...> >>>>>>> >>>>>>>> Hi All, >>>>>>>> I measured the scale up for both approaches - a. using datanode >>>>>>>> connections as tapes (existing one) b. materialising result on tapes before >>>>>>>> merging (the approach I proposed). For 1M rows, 5 coordinators I have found >>>>>>>> that approach (a) gives 90% improvement whereas approach (b) gives 50% >>>>>>>> improvement. Although the difference is significant, I feel that approach >>>>>>>> (b) is much cleaner than approach (a) and doesn't have large footprint >>>>>>>> compared to PG code and it takes care of all the cases like 1. >>>>>>>> materialising sorted result, 2. takes care of any number of datanode >>>>>>>> connections without memory overrun. It's possible to improve it further if >>>>>>>> we avoid materialisation of datanode result in tuplestore. >>>>>>>> >>>>>>>> Patch attached for reference. >>>>>>>> >>>>>>>> On Tue, Mar 26, 2013 at 10:38 AM, Ashutosh Bapat < >>>>>>>> ash...@en...> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Mar 26, 2013 at 10:19 AM, Koichi Suzuki < >>>>>>>>> koi...@gm...> wrote: >>>>>>>>> >>>>>>>>>> On thing we should think for option 1 is: >>>>>>>>>> >>>>>>>>>> When a number of the result is huge, applications has to wait long >>>>>>>>>> time until they get the first row. Because this option may need >>>>>>>>>> disk >>>>>>>>>> write, total resource consumption will be larger. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> Yes, I am aware of this fact. Please read the next paragraph and >>>>>>>>> you will see that the current situation is no better. >>>>>>>>> >>>>>>>>> >>>>>>>>>> I'm wondering if we can use "cursor" at database so that we can >>>>>>>>>> read >>>>>>>>>> each tape more simply, I mean, to leave each query node open and >>>>>>>>>> read >>>>>>>>>> next row from any query node. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> We do that right now. But because of such a simulated cursor (it's >>>>>>>>> not cursor per say, but we just fetch the required result from connection >>>>>>>>> as the demand arises in merging runs), we observer following things >>>>>>>>> >>>>>>>>> If the plan has multiple remote query nodes (as there will be in >>>>>>>>> case of merge join), we assign the same connection to these nodes. Before >>>>>>>>> this assignment, the result from the previous connection is materialised at >>>>>>>>> the coordinator. This means that, when we will get huge result from the >>>>>>>>> datanode, it will be materialised (which will have the more cost as >>>>>>>>> materialising it on tape, as this materialisation happens in a linked list, >>>>>>>>> which is not optimized). We need to share connection between more than one >>>>>>>>> RemoteQuery node because same transaction can not work on two connections >>>>>>>>> to same server. Not only performance, but the code has become ugly because >>>>>>>>> of this approach. At various places in executor, we have special handling >>>>>>>>> for sorting, which needs to be maintained. >>>>>>>>> >>>>>>>>> Instead if we materialise all the result on tape and then proceed >>>>>>>>> with step D5 in Knuth's algorithm for polyphase merge sort, the code will >>>>>>>>> be much simpler and we won't loose much performance. In fact, we might be >>>>>>>>> able to leverage fetching bulk data on connection which can be materialised >>>>>>>>> on tape in bulk. >>>>>>>>> >>>>>>>>> >>>>>>>>>> Regards; >>>>>>>>>> ---------- >>>>>>>>>> Koichi Suzuki >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 2013/3/25 Ashutosh Bapat <ash...@en...>: >>>>>>>>>> > Hi All, >>>>>>>>>> > I am working on using remote sorting for merge joins. The idea >>>>>>>>>> is while >>>>>>>>>> > using merge join at the coordinator, get the data sorted from >>>>>>>>>> the datanodes; >>>>>>>>>> > for replicated relations, we can get all the rows sorted and >>>>>>>>>> for distributed >>>>>>>>>> > tables we have to get sorted runs which can be merged at the >>>>>>>>>> coordinator. >>>>>>>>>> > For merge join the sorted inner relation needs to be randomly >>>>>>>>>> accessible. >>>>>>>>>> > For replicated relations this can be achieved by materialising >>>>>>>>>> the result. >>>>>>>>>> > But for distributed relations, we do not materialise the sorted >>>>>>>>>> result at >>>>>>>>>> > coordinator but compute the sorted result by merging the sorted >>>>>>>>>> results from >>>>>>>>>> > individual nodes on the fly. For distributed relations, the >>>>>>>>>> connection to >>>>>>>>>> > the datanodes themselves are used as logical tapes (which >>>>>>>>>> provide the sorted >>>>>>>>>> > runs). The final result is computed on the fly by choosing the >>>>>>>>>> smallest or >>>>>>>>>> > greatest row (as required) from the connections. >>>>>>>>>> > >>>>>>>>>> > For a Sort node the materialised result can reside in memory >>>>>>>>>> (if it fits >>>>>>>>>> > there) or on one of the logical tapes used for merge sort. So, >>>>>>>>>> in order to >>>>>>>>>> > provide random access to the sorted result, we need to >>>>>>>>>> materialise the >>>>>>>>>> > result either in the memory or on the logical tape. In-memory >>>>>>>>>> > materialisation is not easily possible since we have already >>>>>>>>>> resorted for >>>>>>>>>> > tape based sort, in case of distributed relations and to >>>>>>>>>> materialise the >>>>>>>>>> > result on tape, there is no logical tape available in current >>>>>>>>>> algorithm. To >>>>>>>>>> > make it work, there are following possible ways >>>>>>>>>> > >>>>>>>>>> > 1. When random access is required, materialise the sorted runs >>>>>>>>>> from >>>>>>>>>> > individual nodes onto tapes (one tape for each node) and then >>>>>>>>>> merge them on >>>>>>>>>> > one extra tape, which can be used for materialisation. >>>>>>>>>> > 2. Use a mix of connections and logical tape in the same tape >>>>>>>>>> set. Merge the >>>>>>>>>> > sorted runs from connections on a logical tape in the same >>>>>>>>>> logical tape set. >>>>>>>>>> > >>>>>>>>>> > While the second one looks attractive from performance >>>>>>>>>> perspective (it saves >>>>>>>>>> > writing and reading from the tape), it would make the merge >>>>>>>>>> code ugly by >>>>>>>>>> > using mixed tapes. The read calls for connection and logical >>>>>>>>>> tape are >>>>>>>>>> > different and we will need both on the logical tape where the >>>>>>>>>> final result >>>>>>>>>> > is materialized. So, I am thinking of going with 1, in fact, to >>>>>>>>>> have same >>>>>>>>>> > code to handle remote sort, use 1 in all cases (whether or not >>>>>>>>>> > materialization is required). >>>>>>>>>> > >>>>>>>>>> > Had original authors of remote sort code thought about this >>>>>>>>>> materialization? >>>>>>>>>> > Anything they can share on this topic? >>>>>>>>>> > Any comment? >>>>>>>>>> > -- >>>>>>>>>> > Best Wishes, >>>>>>>>>> > Ashutosh Bapat >>>>>>>>>> > EntepriseDB Corporation >>>>>>>>>> > The Enterprise Postgres Company >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> ------------------------------------------------------------------------------ >>>>>>>>>> > Everyone hates slow websites. So do we. >>>>>>>>>> > Make your web apps faster with AppDynamics >>>>>>>>>> > Download AppDynamics Lite for free today: >>>>>>>>>> > http://p.sf.net/sfu/appdyn_d2d_mar >>>>>>>>>> > _______________________________________________ >>>>>>>>>> > Postgres-xc-developers mailing list >>>>>>>>>> > Pos...@li... >>>>>>>>>> > >>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>>>>>>> > >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Best Wishes, >>>>>>>>> Ashutosh Bapat >>>>>>>>> EntepriseDB Corporation >>>>>>>>> The Enterprise Postgres Company >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Best Wishes, >>>>>>>> Ashutosh Bapat >>>>>>>> EntepriseDB Corporation >>>>>>>> The Enterprise Postgres Company >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Best Wishes, >>>>>> Ashutosh Bapat >>>>>> EntepriseDB Corporation >>>>>> The Enterprise Postgres Company >>>>>> >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Own the Future-Intel® Level Up Game Demo Contest 2013 >>>>> Rise to greatness in Intel's independent game demo contest. >>>>> Compete for recognition, cash, and the chance to get your game >>>>> on Steam. $5K grand prize plus 10 genre and skill prizes. >>>>> Submit your demo by 6/6/13. http://p.sf.net/sfu/intel_levelupd2d >>>>> _______________________________________________ >>>>> Postgres-xc-developers mailing list >>>>> Pos...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>> >>>>> >>>> >>>> >>>> -- >>>> -- >>>> Abbas >>>> Architect >>>> EnterpriseDB Corporation >>>> The Enterprise PostgreSQL Company >>>> >>>> Phone: 92-334-5100153 >>>> >>>> Website: www.enterprisedb.com >>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>>> >>>> This e-mail message (and any attachment) is intended for the use of >>>> the individual or entity to whom it is addressed. This message >>>> contains information from EnterpriseDB Corporation that may be >>>> privileged, confidential, or exempt from disclosure under applicable >>>> law. If you are not the intended recipient or authorized to receive >>>> this for the intended recipient, any use, dissemination, distribution, >>>> retention, archiving, or copying of this communication is strictly >>>> prohibited. If you have received this e-mail in error, please notify >>>> the sender immediately by reply e-mail and delete this message. >>>> >>> >>> >>> >>> -- >>> -- >>> Abbas >>> Architect >>> EnterpriseDB Corporation >>> The Enterprise PostgreSQL Company >>> >>> Phone: 92-334-5100153 >>> >>> Website: www.enterprisedb.com >>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>> >>> This e-mail message (and any attachment) is intended for the use of >>> the individual or entity to whom it is addressed. This message >>> contains information from EnterpriseDB Corporation that may be >>> privileged, confidential, or exempt from disclosure under applicable >>> law. If you are not the intended recipient or authorized to receive >>> this for the intended recipient, any use, dissemination, distribution, >>> retention, archiving, or copying of this communication is strictly >>> prohibited. If you have received this e-mail in error, please notify >>> the sender immediately by reply e-mail and delete this message. >>> >> >> >> >> -- >> Best Wishes, >> Ashutosh Bapat >> EntepriseDB Corporation >> The Enterprise Postgres Company >> > > > > -- > -- > Abbas > Architect > EnterpriseDB Corporation > The Enterprise PostgreSQL Company > > Phone: 92-334-5100153 > > Website: www.enterprisedb.com > EnterpriseDB Blog: http://blogs.enterprisedb.com/ > Follow us on Twitter: http://www.twitter.com/enterprisedb > > This e-mail message (and any attachment) is intended for the use of > the individual or entity to whom it is addressed. This message > contains information from EnterpriseDB Corporation that may be > privileged, confidential, or exempt from disclosure under applicable > law. If you are not the intended recipient or authorized to receive > this for the intended recipient, any use, dissemination, distribution, > retention, archiving, or copying of this communication is strictly > prohibited. If you have received this e-mail in error, please notify > the sender immediately by reply e-mail and delete this message. > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Amit K. <ami...@en...> - 2013-04-18 11:36:07
|
On 18 April 2013 02:43, Abbas Butt <abb...@en...> wrote: > While doing some testing I found that just checking > res == LOCKACQUIRE_OK > is not enough, the function should succeed if the lock is already held > too, hence the condition should be > (res == LOCKACQUIRE_OK || res == LOCKACQUIRE_ALREADY_HELD) > Attached patch corrects this problem. This looks fine to me. > > > > On Mon, Apr 8, 2013 at 11:23 AM, Abbas Butt <abb...@en...>wrote: > >> Thanks. I will commit it later today. >> >> >> On Mon, Apr 8, 2013 at 9:52 AM, Amit Khandekar < >> ami...@en...> wrote: >> >>> Hi Abbas, >>> >>> The patch looks good to go. >>> >>> -Amit >>> >>> >>> On 6 April 2013 01:02, Abbas Butt <abb...@en...> wrote: >>> >>>> Hi, >>>> >>>> Consider this test case when run on a single coordinator cluster. >>>> >>>> From one session acquire a lock >>>> >>>> edb@edb-virtual-machine:/usr/local/pgsql/bin$ ./psql postgres >>>> psql (PGXC 1.1devel, based on PG 9.2beta2) >>>> Type "help" for help. >>>> >>>> postgres=# select pg_try_advisory_lock(1234,5678); >>>> pg_try_advisory_lock >>>> ---------------------- >>>> t >>>> (1 row) >>>> >>>> >>>> and from another terminal try to acquire the same lock >>>> >>>> edb@edb-virtual-machine:/usr/local/pgsql/bin$ ./psql postgres >>>> psql (PGXC 1.1devel, based on PG 9.2beta2) >>>> Type "help" for help. >>>> >>>> postgres=# select pg_try_advisory_lock(1234,5678); >>>> pg_try_advisory_lock >>>> ---------------------- >>>> t >>>> (1 row) >>>> >>>> Note that the second request succeeds where as the lock is already held >>>> by the first session. >>>> >>>> The problem is that pgxc_advisory_lock neglects the return of >>>> LockAcquire function in case of single coordinator. >>>> The attached patch corrects the problem. >>>> >>>> Comments are welcome. >>>> >>>> >>>> -- >>>> Abbas >>>> Architect >>>> EnterpriseDB Corporation >>>> The Enterprise PostgreSQL Company >>>> >>>> Phone: 92-334-5100153 >>>> >>>> Website: www.enterprisedb.com >>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>>> >>>> This e-mail message (and any attachment) is intended for the use of >>>> the individual or entity to whom it is addressed. This message >>>> contains information from EnterpriseDB Corporation that may be >>>> privileged, confidential, or exempt from disclosure under applicable >>>> law. If you are not the intended recipient or authorized to receive >>>> this for the intended recipient, any use, dissemination, distribution, >>>> retention, archiving, or copying of this communication is strictly >>>> prohibited. If you have received this e-mail in error, please notify >>>> the sender immediately by reply e-mail and delete this message. >>>> >>>> ------------------------------------------------------------------------------ >>>> Minimize network downtime and maximize team effectiveness. >>>> Reduce network management and security costs.Learn how to hire >>>> the most talented Cisco Certified professionals. Visit the >>>> Employer Resources Portal >>>> http://www.cisco.com/web/learning/employer_resources/index.html >>>> _______________________________________________ >>>> Postgres-xc-developers mailing list >>>> Pos...@li... >>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>> >>>> >>> >> >> >> -- >> -- >> Abbas >> Architect >> EnterpriseDB Corporation >> The Enterprise PostgreSQL Company >> >> Phone: 92-334-5100153 >> >> Website: www.enterprisedb.com >> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> Follow us on Twitter: http://www.twitter.com/enterprisedb >> >> This e-mail message (and any attachment) is intended for the use of >> the individual or entity to whom it is addressed. This message >> contains information from EnterpriseDB Corporation that may be >> privileged, confidential, or exempt from disclosure under applicable >> law. If you are not the intended recipient or authorized to receive >> this for the intended recipient, any use, dissemination, distribution, >> retention, archiving, or copying of this communication is strictly >> prohibited. If you have received this e-mail in error, please notify >> the sender immediately by reply e-mail and delete this message. >> > > > > -- > -- > Abbas > Architect > EnterpriseDB Corporation > The Enterprise PostgreSQL Company > > Phone: 92-334-5100153 > > Website: www.enterprisedb.com > EnterpriseDB Blog: http://blogs.enterprisedb.com/ > Follow us on Twitter: http://www.twitter.com/enterprisedb > > This e-mail message (and any attachment) is intended for the use of > the individual or entity to whom it is addressed. This message > contains information from EnterpriseDB Corporation that may be > privileged, confidential, or exempt from disclosure under applicable > law. If you are not the intended recipient or authorized to receive > this for the intended recipient, any use, dissemination, distribution, > retention, archiving, or copying of this communication is strictly > prohibited. If you have received this e-mail in error, please notify > the sender immediately by reply e-mail and delete this message. > |