You can subscribe to this list here.
2010 |
Jan
|
Feb
|
Mar
|
Apr
(10) |
May
(17) |
Jun
(3) |
Jul
|
Aug
|
Sep
(8) |
Oct
(18) |
Nov
(51) |
Dec
(74) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2011 |
Jan
(47) |
Feb
(44) |
Mar
(44) |
Apr
(102) |
May
(35) |
Jun
(25) |
Jul
(56) |
Aug
(69) |
Sep
(32) |
Oct
(37) |
Nov
(31) |
Dec
(16) |
2012 |
Jan
(34) |
Feb
(127) |
Mar
(218) |
Apr
(252) |
May
(80) |
Jun
(137) |
Jul
(205) |
Aug
(159) |
Sep
(35) |
Oct
(50) |
Nov
(82) |
Dec
(52) |
2013 |
Jan
(107) |
Feb
(159) |
Mar
(118) |
Apr
(163) |
May
(151) |
Jun
(89) |
Jul
(106) |
Aug
(177) |
Sep
(49) |
Oct
(63) |
Nov
(46) |
Dec
(7) |
2014 |
Jan
(65) |
Feb
(128) |
Mar
(40) |
Apr
(11) |
May
(4) |
Jun
(8) |
Jul
(16) |
Aug
(11) |
Sep
(4) |
Oct
(1) |
Nov
(5) |
Dec
(16) |
2015 |
Jan
(5) |
Feb
|
Mar
(2) |
Apr
(5) |
May
(4) |
Jun
(12) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(4) |
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
S | M | T | W | T | F | S |
---|---|---|---|---|---|---|
1
|
2
(1) |
3
(6) |
4
(19) |
5
|
6
(15) |
7
(2) |
8
(2) |
9
(22) |
10
(20) |
11
(20) |
12
(14) |
13
(12) |
14
(2) |
15
|
16
(14) |
17
(17) |
18
(4) |
19
(8) |
20
(2) |
21
(3) |
22
|
23
(8) |
24
(1) |
25
|
26
(2) |
27
(1) |
28
|
29
|
30
(7) |
31
(3) |
|
|
|
|
From: Andrei M. <and...@gm...> - 2012-07-17 02:40:09
|
It is great that you are going to merge Coordinator and Datanode. Mason and I discussed this item but did not agreed upon, now I am getting majority. We would be interested in participating the discussion. Anyway, merging of Coordinator and Datanode and merging Proxy a differ in term of code changes. First basically involves pooler, executor and planner module changes, former needs a wrapper and integration into postmaster, probably some refactoring in gtm and gtm proxy modules. I think we can keep the proxy multithreaded, at least on the first integration stage. The pooler already linked with the pthreads stuff, as it is using libpq. Indeed, I think we would keep the socket communication between backend and pooler. I am not sure if Unix socket is any faster then TCP/IP socket, it would be possible to use Unix sockets as well. Later on, we would be able to use shared memory. If the proxy maintains a local copy of the GTM snapshot in shared memory it will be better if process make their local copies from that one, then send them around via sockets. 2012/7/17 Koichi Suzuki <koi...@gm...> > 2012/7/16 Michael Paquier <mic...@gm...>: > > This merge idea looks nice. But I see 2 issues. > > 1) What is cruelly missing in the current design is the possibility to > run a > > Postgres-XC node as a GTM-Proxy standalone. This means that this > Postgres-XC > > node would only proxy messages to GTM and only that. It will need to > reject > > other client applications. > > This would allow cascading of GTM Proxies and more flexibility in the > > system. > > Cascading GTM proxies is not supported yet and I'm not sure what gain > we have with it. I don't think it's reasonable to run > coordinator/node instance for this purpose anyway. > > > If you guys are able to add that, well we gain in code maintenance and > keep > > the same architecture possible. > > 2) Performance? We will need to test that under really a heavy load > before > > committing it. > > Okay. Only one reason I expect for performance is we can save > (local) socket communication between the backend and the proxy. > However, if we make proxy a separate process, just sharing the binary, > we could be suffered by another overhead (for example, pg_proc locks, > etc). > > Because of the difference in the process mode, integrating proxy with > coordinator/backend may need to rewrite the code. > > Regards; > --- > Koichi > > > -- > > Michael Paquier > > http://michael.otacoo.com > -- Andrei Martsinchyk StormDB - http://www.stormdb.com The Database Cloud |
From: Koichi S. <koi...@gm...> - 2012-07-17 00:55:32
|
2012/7/16 Michael Paquier <mic...@gm...>: > This merge idea looks nice. But I see 2 issues. > 1) What is cruelly missing in the current design is the possibility to run a > Postgres-XC node as a GTM-Proxy standalone. This means that this Postgres-XC > node would only proxy messages to GTM and only that. It will need to reject > other client applications. > This would allow cascading of GTM Proxies and more flexibility in the > system. Cascading GTM proxies is not supported yet and I'm not sure what gain we have with it. I don't think it's reasonable to run coordinator/node instance for this purpose anyway. > If you guys are able to add that, well we gain in code maintenance and keep > the same architecture possible. > 2) Performance? We will need to test that under really a heavy load before > committing it. Okay. Only one reason I expect for performance is we can save (local) socket communication between the backend and the proxy. However, if we make proxy a separate process, just sharing the binary, we could be suffered by another overhead (for example, pg_proc locks, etc). Because of the difference in the process mode, integrating proxy with coordinator/backend may need to rewrite the code. Regards; --- Koichi > -- > Michael Paquier > http://michael.otacoo.com |
From: Michael P. <mic...@gm...> - 2012-07-17 00:05:52
|
Just a notice here. It was not in the initial design to allow redistribution in parallel of other alter table operations. On 2012/07/16, at 21:20, Ashutosh Bapat <ash...@en...> wrote: > Finally I got off reviewing this patch. > > One of the problems, with this patch is extensibility. At some point in future (and probably very near future), we should allow adding column (an example) and redistribution to be done at the same time, to reduce the total time of doing ALTER TABLE if it comes to adding column (an example) and redistribute the table at the same time. Basically we should allow all the table rewriting to be done at the same time as the PostgreSQL ALTER TABLE allows to do. The method used here does not leave a room for such combined operation, as an extension in future. This means, that when it comes to supporting the above said capability we have to rewrite the whole thing (leaving may be transformation and node manipulation aside). That's why, I would like to see a fix in ATRewriteTable, where we have sequence for every row 1 get row, 2. rewrite the row 3. write the new row. Does this mean to fetch rows one by one and then resend them back to dedicated nodes? Could you be a little bit more clear about the possible methods usable here. This remark makes sense but you should be more explicit about HOW to do. 1) For example do you mean using a select to fetch each row, then initialize a COPY TO at the beginning of rewrite, then run a for loop on the tuples obtained, materialize them, rewrite them and then send them back with the COPY TO row by row? In this case is it possible to fetch data on a connection and then send data on it at the same time. Is it even possible? 2) do you mean to use the tuplestore that contains all the data fetched and then initialize a COPY at rewrite, and once again do a for loop on the tuplestore tuples, rewrite them and send them. Seriously ideas are good, but ways to solve what you say, or just precisions about possible techniques and not vague assumptions would be better to save precious time. Your suggestions may require to change all the remote execution part for redistribution. It also highly probably makes most of your comments below useless. Thanks. > > Anyway, I have following comment the patch itself, > 1. Need better name for PgxcClassAlter(). > 2. In ATController, why do you need to move ATCheckCmd below ATPrepCmd? > 3. Comments on line 2933 and 2941 can be worded as "Perform pre-catalog-update redistribution operations" and "Perform post-catalog-update redistribution operations" > 4. Why can't redistribution be run in a transaction block? Does the redistribution run as a transaction? What happens if the server crashes while redistribution is being done at various stages of its progress? > 5. Do you want to rename BuildDistribCommands() as BuildReDistribCommands()? > 6. In function BuildDistribCommands(), What does variable new_oids signify? I think it's signifying the new node oids, if so please use name accordingly. > 7. The names of function tree_build_entry() and its minions look pretty generic, we need some prefix to these functions like pgxc_redist_ or something like that. BTW, what's the "tree" there indicate? There is nothing like tree that is being built, it's just a list of commands. > 8. Why are you using repalloc in tree_add_single_command(), you can rather create a list and append to that list. > 9. We don't need two separate functions Pre and Post - Update(), it can be a single function which takes the Pre/Post as flag and runs the relevant commands. BTW, just PreUpdate does not show its real meaning, it should be something like PreCatalogUpdate() or something like that. > 10. Why every catalog change to pgxc_class invalidates the cache? We should do it only once for a given command. > 11. The functions add_oid_list, delete_oid_list etc. are using oid arrays, then why use the suffice _list() in those functions? I do not like the frequence repalloc that happens in both these functions. Worst is the movement of array element in delete_node_list(). Any mistake here would be disastrous. PostgreSQL code is very generous in using memory to keep things simple. You can use lists or bitmaps if you want to save the space, but not repalloc. > 12. Instead of a single file distrib directory, you can use locator directory with distrib.c file (better if you could use name like at_distrib or redistrib, since the files really belong to ALTER TABLE? > 13. In makeRemoteCopyOptions you are using palloc0(), which sets all the memory with 0, so bools automatically get value false and NULL lists as NIL, you don't need to set those explicitly. > 14. I do not see any tests related to sanity of views, rules and other objects which depend upon the table, after the table is redistributed using this method. May be it's a good idea to provide a prologue at the beginning of the testcase specifying how the testcase is laid out. > > On Mon, Jul 16, 2012 at 4:56 PM, Ashutosh Bapat <ash...@en...> wrote: > > In this case you do not need any code! You could also do a simple CREATE TABLE AS to redistribution the table as you wish to a new table, drop the old table, and rename the new table with the old name. This could also be done with 1.0. > You should also have a look at my latest patch, it already includes the maximum optimizations possible for replicated table redistribution, particularly distrib.c. Just by looking at that you will see that a node level control is more than necessary. > > This method would change the OID of the table and thus invalidate all the view definitions, rules etc. depending on this table. We don't want that to happen. The function would not change the OID, but would write the data in the table's space itself. > > -- > Michael Paquier > http://michael.otacoo.com > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > |
From: Koichi S. <koi...@gm...> - 2012-07-16 14:39:25
|
2012/7/16 Ashutosh Bapat <ash...@en...>: > Finally I got off reviewing this patch. > > One of the problems, with this patch is extensibility. At some point in > future (and probably very near future), we should allow adding column (an > example) and redistribution to be done at the same time, to reduce the total > time of doing ALTER TABLE if it comes to adding column (an example) and > redistribute the table at the same time. Basically we should allow all the > table rewriting to be done at the same time as the PostgreSQL ALTER TABLE > allows to do. The method used here does not leave a room for such combined > operation, as an extension in future. This means, that when it comes to > supporting the above said capability we have to rewrite the whole thing > (leaving may be transformation and node manipulation aside). That's why, I > would like to see a fix in ATRewriteTable, where we have sequence for every > row 1 get row, 2. rewrite the row 3. write the new row. I'm afraid this is too much for the current work. In the last core team meeting, we agreed not to include general ALTER TABLE feature in CONCURRENT operation. Although I think it's better to have such extension ready in the current feature, we have much more carried over beyond 2.0, it's also important to have features really work. > > Anyway, I have following comments for the patch itself, > 1. Need better name for PgxcClassAlter(). Any idea? > 2. In ATController, why do you need to move ATCheckCmd below ATPrepCmd? > 3. Comments on line 2933 and 2941 can be worded as "Perform > pre-catalog-update redistribution operations" and "Perform > post-catalog-update redistribution operations" > 4. Why can't redistribution be run in a transaction block? Does the > redistribution run as a transaction? What happens if the server crashes > while redistribution is being done at various stages of its progress? Hmm, this could be some limitation. PostgreSQL's ALTER TABLE, unlike MySQL and older version of Oracle, can run in a transaction block and most users would like this feature too. > 5. Do you want to rename BuildDistribCommands() as BuildReDistribCommands()? > 6. In function BuildDistribCommands(), What does variable new_oids signify? > I think it's signifying the new node oids, if so please use name > accordingly. > 7. The names of function tree_build_entry() and its minions look pretty > generic, we need some prefix to these functions like pgxc_redist_ or > something like that. BTW, what's the "tree" there indicate? There is nothing > like tree that is being built, it's just a list of commands. > 8. Why are you using repalloc in tree_add_single_command(), you can rather > create a list and append to that list. > 9. We don't need two separate functions Pre and Post - Update(), it can be a > single function which takes the Pre/Post as flag and runs the relevant > commands. BTW, just PreUpdate does not show its real meaning, it should be > something like PreCatalogUpdate() or something like that. > 10. Why every catalog change to pgxc_class invalidates the cache? We should > do it only once for a given command. > 11. The functions add_oid_list, delete_oid_list etc. are using oid arrays, > then why use the suffice _list() in those functions? I do not like the > frequence repalloc that happens in both these functions. Worst is the > movement of array element in delete_node_list(). Any mistake here would be > disastrous. PostgreSQL code is very generous in using memory to keep things > simple. You can use lists or bitmaps if you want to save the space, but not > repalloc. > 12. Instead of a single file distrib directory, you can use locator > directory with distrib.c file (better if you could use name like at_distrib > or redistrib, since the files really belong to ALTER TABLE? > 13. In makeRemoteCopyOptions you are using palloc0(), which sets all the > memory with 0, so bools automatically get value false and NULL lists as NIL, > you don't need to set those explicitly. > 14. I do not see any tests related to sanity of views, rules and other > objects which depend upon the table, after the table is redistributed using > this method. May be it's a good idea to provide a prologue at the beginning > of the testcase specifying how the testcase is laid out. > > > On Mon, Jul 16, 2012 at 4:56 PM, Ashutosh Bapat > <ash...@en...> wrote: >> >> >>> In this case you do not need any code! You could also do a simple CREATE >>> TABLE AS to redistribution the table as you wish to a new table, drop the >>> old table, and rename the new table with the old name. This could also be >>> done with 1.0. >>> You should also have a look at my latest patch, it already includes the >>> maximum optimizations possible for replicated table redistribution, >>> particularly distrib.c. Just by looking at that you will see that a node >>> level control is more than necessary. >> >> >> This method would change the OID of the table and thus invalidate all the >> view definitions, rules etc. depending on this table. We don't want that to >> happen. The function would not change the OID, but would write the data in >> the table's space itself. >> >>> >>> -- >>> Michael Paquier >>> http://michael.otacoo.com >> >> >> >> >> -- >> Best Wishes, >> Ashutosh Bapat >> EntepriseDB Corporation >> The Enterprise Postgres Company >> > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > |
From: Ashutosh B. <ash...@en...> - 2012-07-16 12:20:15
|
Finally I got off reviewing this patch. One of the problems, with this patch is extensibility. At some point in future (and probably very near future), we should allow adding column (an example) and redistribution to be done at the same time, to reduce the total time of doing ALTER TABLE if it comes to adding column (an example) and redistribute the table at the same time. Basically we should allow all the table rewriting to be done at the same time as the PostgreSQL ALTER TABLE allows to do. The method used here does not leave a room for such combined operation, as an extension in future. This means, that when it comes to supporting the above said capability we have to rewrite the whole thing (leaving may be transformation and node manipulation aside). That's why, I would like to see a fix in ATRewriteTable, where we have sequence for every row 1 get row, 2. rewrite the row 3. write the new row. Anyway, I have following comments for the patch itself, 1. Need better name for PgxcClassAlter(). 2. In ATController, why do you need to move ATCheckCmd below ATPrepCmd? 3. Comments on line 2933 and 2941 can be worded as "Perform pre-catalog-update redistribution operations" and "Perform post-catalog-update redistribution operations" 4. Why can't redistribution be run in a transaction block? Does the redistribution run as a transaction? What happens if the server crashes while redistribution is being done at various stages of its progress? 5. Do you want to rename BuildDistribCommands() as BuildReDistribCommands()? 6. In function BuildDistribCommands(), What does variable new_oids signify? I think it's signifying the new node oids, if so please use name accordingly. 7. The names of function tree_build_entry() and its minions look pretty generic, we need some prefix to these functions like pgxc_redist_ or something like that. BTW, what's the "tree" there indicate? There is nothing like tree that is being built, it's just a list of commands. 8. Why are you using repalloc in tree_add_single_command(), you can rather create a list and append to that list. 9. We don't need two separate functions Pre and Post - Update(), it can be a single function which takes the Pre/Post as flag and runs the relevant commands. BTW, just PreUpdate does not show its real meaning, it should be something like PreCatalogUpdate() or something like that. 10. Why every catalog change to pgxc_class invalidates the cache? We should do it only once for a given command. 11. The functions add_oid_list, delete_oid_list etc. are using oid arrays, then why use the suffice _list() in those functions? I do not like the frequence repalloc that happens in both these functions. Worst is the movement of array element in delete_node_list(). Any mistake here would be disastrous. PostgreSQL code is very generous in using memory to keep things simple. You can use lists or bitmaps if you want to save the space, but not repalloc. 12. Instead of a single file distrib directory, you can use locator directory with distrib.c file (better if you could use name like at_distrib or redistrib, since the files really belong to ALTER TABLE? 13. In makeRemoteCopyOptions you are using palloc0(), which sets all the memory with 0, so bools automatically get value false and NULL lists as NIL, you don't need to set those explicitly. 14. I do not see any tests related to sanity of views, rules and other objects which depend upon the table, after the table is redistributed using this method. May be it's a good idea to provide a prologue at the beginning of the testcase specifying how the testcase is laid out. On Mon, Jul 16, 2012 at 4:56 PM, Ashutosh Bapat < ash...@en...> wrote: > > In this case you do not need any code! You could also do a simple CREATE >> TABLE AS to redistribution the table as you wish to a new table, drop the >> old table, and rename the new table with the old name. This could also be >> done with 1.0. >> You should also have a look at my latest patch, it already includes the >> maximum optimizations possible for replicated table redistribution, >> particularly distrib.c. Just by looking at that you will see that a node >> level control is more than necessary. >> > > This method would change the OID of the table and thus invalidate all the > view definitions, rules etc. depending on this table. We don't want that to > happen. The function would not change the OID, but would write the data in > the table's space itself. > > >> -- >> Michael Paquier >> http://michael.otacoo.com >> > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Ashutosh B. <ash...@en...> - 2012-07-16 11:26:36
|
> In this case you do not need any code! You could also do a simple CREATE > TABLE AS to redistribution the table as you wish to a new table, drop the > old table, and rename the new table with the old name. This could also be > done with 1.0. > You should also have a look at my latest patch, it already includes the > maximum optimizations possible for replicated table redistribution, > particularly distrib.c. Just by looking at that you will see that a node > level control is more than necessary. > This method would change the OID of the table and thus invalidate all the view definitions, rules etc. depending on this table. We don't want that to happen. The function would not change the OID, but would write the data in the table's space itself. > -- > Michael Paquier > http://michael.otacoo.com > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Michael P. <mic...@gm...> - 2012-07-16 07:31:43
|
On Mon, Jul 16, 2012 at 3:25 PM, Ashutosh Bapat < ash...@en...> wrote: > > >> Honestly I thought about such solutions before beginning my stuff, but >> arrived at the conclusion that we need a native way to manage >> redistribution. In this case, you fall under the following limitations: >> - How to manage node-level granularity for redistribution operations? In >> XC, by design, a SQL is launched globally on the cluster. At the exception >> of EXECUTE DIRECT but it only allows SELECT commands. >> > > Under some conditions (using a GUC, say) we may allow having TRUNCATE to > be executed as part of EXECUTE DIRECT, if it becomes necessary. But we may > have problems related to permissions. See below. > That is why is it really necessary to > > >> - When and how to manage the catalog update? Even if it is possible to >> update catalog with DMLs, we need native APIs to be able to modify catalog >> entries. >> > > We can update catalogs using simple catalog updates, and such updates are > transactionally safe. I verified that. (See my patch on separate thread). > Yes they are, but manipulating catalogs with DML is not postgreSQL-like, as usually catalog update has as consequences to make other internal operations like dependencies, permission checks and cache invalidation. > > >> - CREATE NODE is used for the addition and deletion of nodes. Data >> redistribution does not concern changing the configuration of the cluster. >> You got a certain number of Datanodes, Coordinators, and you want to change >> the nodes where data of a table is located inside this given cluster. Your >> approach makes redistribution dependent on cluster configuration and it is >> honestly not welcome to add such degrees of dependencies that may be a >> burden in the future if we change once again the way cluster is configured. >> > > Probably Nikhil misunderstood intent of CREATE NODE. CREATE NODE has > nothing to do with redistribution. > Yes, think so. > > >> - For certain redistribution operations, we do NOT always need a >> transaction block! Just take the example of a replicated table changed to a >> distributed table. You just need to send a DELETE query to remote nodes to >> remove only the tuples that do not satisfy a hash condition. This is one of >> those things I am working on now. >> > > We will need transaction block to make the utility transactionally safe. > >> To my mind, each argument here makes necessary this feature in core. All >> combined even strengthen my arguments. >> >> > I am not able to answer still serious problems - > 1. Checking permissions to allow such a function to be run. Should we > allow anyone who can run those DDLs/DMLs run the utility? > Only the owner of the table. But this needs to be controlled internally. We may have serious security issues lying under that. > 2. Cache invalidation - how do we invalidate plan caches after > redistributing the data? Plan stores the list of nodes where the queries > should be executed. Storing this list of nodes in plans is essential from > JOIN reduction. > Yes, and for a couple of other things. You will need to add externally some runs of DEALLOCATE ALL, but this will impact all tables and not only the one being redistributed. My latest patch only invalidated cache of the table involved by redistribution. > > Are there answers to these questions? > > But this approach has some merits - 1. It reduces the impact on the code. > Yes, but redistribution needs to be a core operation. > 2. There are few things not so good in this approach, like disk and > network usage, which needs to be optimized. By taking this approach, we > provide an easy (albeit unoptimised) way of re-distributing the tables and > can work on the final version of code changes for ALTER TABLE DISTRIBUTE > afterwards, keeping longer slot for the work. > In this case you do not need any code! You could also do a simple CREATE TABLE AS to redistribution the table as you wish to a new table, drop the old table, and rename the new table with the old name. This could also be done with 1.0. You should also have a look at my latest patch, it already includes the maximum optimizations possible for replicated table redistribution, particularly distrib.c. Just by looking at that you will see that a node level control is more than necessary. -- Michael Paquier http://michael.otacoo.com |
From: Ashutosh B. <ash...@en...> - 2012-07-16 06:25:37
|
> Honestly I thought about such solutions before beginning my stuff, but > arrived at the conclusion that we need a native way to manage > redistribution. In this case, you fall under the following limitations: > - How to manage node-level granularity for redistribution operations? In > XC, by design, a SQL is launched globally on the cluster. At the exception > of EXECUTE DIRECT but it only allows SELECT commands. > Under some conditions (using a GUC, say) we may allow having TRUNCATE to be executed as part of EXECUTE DIRECT, if it becomes necessary. But we may have problems related to permissions. See below. > - When and how to manage the catalog update? Even if it is possible to > update catalog with DMLs, we need native APIs to be able to modify catalog > entries. > We can update catalogs using simple catalog updates, and such updates are transactionally safe. I verified that. (See my patch on separate thread). > - CREATE NODE is used for the addition and deletion of nodes. Data > redistribution does not concern changing the configuration of the cluster. > You got a certain number of Datanodes, Coordinators, and you want to change > the nodes where data of a table is located inside this given cluster. Your > approach makes redistribution dependent on cluster configuration and it is > honestly not welcome to add such degrees of dependencies that may be a > burden in the future if we change once again the way cluster is configured. > Probably Nikhil misunderstood intent of CREATE NODE. CREATE NODE has nothing to do with redistribution. > - For certain redistribution operations, we do NOT always need a > transaction block! Just take the example of a replicated table changed to a > distributed table. You just need to send a DELETE query to remote nodes to > remove only the tuples that do not satisfy a hash condition. This is one of > those things I am working on now. > We will need transaction block to make the utility transactionally safe. > > To my mind, each argument here makes necessary this feature in core. All > combined even strengthen my arguments. > > I am not able to answer still serious problems - 1. Checking permissions to allow such a function to be run. Should we allow anyone who can run those DDLs/DMLs run the utility? 2. Cache invalidation - how do we invalidate plan caches after redistributing the data? Plan stores the list of nodes where the queries should be executed. Storing this list of nodes in plans is essential from JOIN reduction. Are there answers to these questions? But this approach has some merits - 1. It reduces the impact on the code. 2. There are few things not so good in this approach, like disk and network usage, which needs to be optimized. By taking this approach, we provide an easy (albeit unoptimised) way of re-distributing the tables and can work on the final version of code changes for ALTER TABLE DISTRIBUTE afterwards, keeping longer slot for the work. > >> Regards, >> Nikhils >> >> > Something I am afraid is not possible with an external utility is >> control of >> > redistribution at node level. For example, an external contrib module or >> > utility will launch SQL queries to xc that have to be treated as global. >> > However, redistribution needs to take care of cases like for example the >> > reduction of nodes for replicated tables. In this case you just need to >> > delete the data from removed nodes. Another easy example is the case of >> an >> > increase of nodes for replicated tables. You need to pick up data on >> > coordinator and then send it only to the new nodes. Those simple >> examples >> > need a core management to minimize the work of redistribution inside >> > cluster. >> > >> > On 2012/07/13, at 15:09, Ashutosh Bapat < >> ash...@en...> >> > wrote: >> > >> > Even, I am wondering if that would be better. >> > >> > But, one thing that is essential is catalog updates. Are you suggesting >> that >> > the catalog updates too should be done using some SQL? >> > >> > Can you please expand more on your idea, may be providing some examples, >> > pseudo-code etc.? >> > >> > On Fri, Jul 13, 2012 at 10:36 AM, Nikhil Sontakke <ni...@st...> >> > wrote: >> >> >> >> Just a thought. >> >> >> >> If we have a utility which spews out all of these statements to >> >> redistribute a table across node modifications, then we can just wrap >> >> them inside a transaction block and just run that? >> >> >> >> Wont it save all of the core changes? >> >> >> >> Regards, >> >> Nikhils >> >> >> >> On Fri, Jul 13, 2012 at 12:29 AM, Michael Paquier >> >> <mic...@gm...> wrote: >> >> > Hi all, >> >> > >> >> > Please find attached an updated patch adding redistribution >> >> > optimizations >> >> > for replicated tables. >> >> > If the node subset of a replicated table is reduced, the necessary >> nodes >> >> > are >> >> > simply truncated. >> >> > If it is increased, a COPY TO is done to fetch the data, and COPY >> FROM >> >> > is >> >> > done only on the necessary nodes. >> >> > New regression tests have been added to test that. >> >> > >> >> > Regards, >> >> > >> >> > >> >> > On Thu, Jul 12, 2012 at 5:30 PM, Michael Paquier >> >> > <mic...@gm...> >> >> > wrote: >> >> >> >> >> >> OK, here is the mammoth patch: 3000 lines including docs, >> >> >> implementation >> >> >> and regressions. >> >> >> The code has been realigned with current master. >> >> >> This patch introduces the latest thing I am working on: the >> >> >> redistribution >> >> >> command tree planning and execution. >> >> >> >> >> >> As I explained before, a redistribution consists of a series of >> >> >> commands >> >> >> (TRUNCATE, REINDEX, DELETE, COPY FROM, COPY TO) that need to be >> >> >> determined >> >> >> depending on the new and old locator information of the relation. >> Each >> >> >> action can be done on a subset of nodes. >> >> >> This patch introduces the basic infrastructure of the command tree >> >> >> build >> >> >> and execution. >> >> >> For the time being, redistribution uses only what is called the >> default >> >> >> command tree consisting of: >> >> >> 1) COPY TO >> >> >> 2) TRUNCATE >> >> >> 3) COPY FROM >> >> >> 4) REINDEX >> >> >> But this structure can be easily completed with more complicated >> >> >> operations. >> >> >> In this patch there is still a small thing missing which is the >> >> >> possibility to launch a COPY FROM on a subset of nodes, particularly >> >> >> useful >> >> >> when redistribution consists of a replicated table whose set of >> nodes >> >> >> is >> >> >> increased. >> >> >> Compared to the last versions, the impact of redistribution in >> >> >> tablecmds.c >> >> >> is limited. >> >> >> >> >> >> Regards, >> >> >> >> >> >> -- >> >> >> Michael Paquier >> >> >> http://michael.otacoo.com >> >> > >> >> > >> >> > >> >> > >> >> > -- >> >> > Michael Paquier >> >> > http://michael.otacoo.com >> >> > >> >> > >> >> > >> ------------------------------------------------------------------------------ >> >> > Live Security Virtual Conference >> >> > Exclusive live event will cover all the ways today's security and >> >> > threat landscape has changed and how IT managers can respond. >> >> > Discussions >> >> > will include endpoint security, mobile security and the latest in >> >> > malware >> >> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> >> > _______________________________________________ >> >> > Postgres-xc-developers mailing list >> >> > Pos...@li... >> >> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >> > >> >> >> >> >> >> >> >> -- >> >> StormDB - http://www.stormdb.com >> >> The Database Cloud >> > >> > >> > >> > >> > -- >> > Best Wishes, >> > Ashutosh Bapat >> > EntepriseDB Corporation >> > The Enterprise Postgres Company >> > >> >> >> >> -- >> StormDB - http://www.stormdb.com >> The Database Cloud >> > > > > -- > Michael Paquier > http://michael.otacoo.com > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Michael P. <mic...@gm...> - 2012-07-16 05:33:21
|
Here make check works correctly with CentOS 5.X, Ubuntu and ArchLinux. You should have a look at each node's log file in src/test/regress/log/ to find what is happening. -- Michael Paquier http://michael.otacoo.com |
From: Michael P. <mic...@gm...> - 2012-07-16 05:32:42
|
It looks like you are not at the beach today, based in Ibaraki? :) On Mon, Jul 16, 2012 at 2:18 PM, <hir...@hi...> wrote: > > Hi. Develoers. > > In the environment using gtm_proxy, in order that error may come out, > please let me know the vital point of correction. > (when not using gtm_proxy ( = When accessing direct to gtm ), error does > not come out.) > > [ server's composition] > Node1(oss240g1): coord1, datanode1, gtm_proxy1 > Node2(oss240g2): coord2, datanode2, gtm_proxy2 > Node3(oss240g3): gtm and Operation of pgbench > > [oss240g3 pgbench operation] > [pgxc@oss240g3 co2dn2]$ pgbench -n -c 13 test1 -h oss240g1 > Client 2 aborted in state 12: ERROR: GTM error, could not obtain snapshot > Client 3 aborted in state 10: ERROR: GTM error, could not obtain snapshot > Client 9 aborted in state 11: ERROR: GTM error, could not obtain snapshot > transaction type: TPC-B (sort of) > scaling factor: 10 > query mode: simple > number of clients: 13 > number of threads: 1 > number of transactions per client: 10 > number of transactions actually processed: 126/130 > tps = .... > > [gtm's log] > 1:140678569928448:2012-07-15 17:31:38.641 JST -LOG: Any GTM standby node > not found in registered node(s). > LOCATION: gtm_standby_connect_to_standby_int, gtm_standby.c:376 > 1:140678569920256:2012-07-15 17:31:38.645 JST -LOG: > ProcessPGXCNodeRegister: ipaddress = "localhost", node name = > "gtm_proxy1", proxy name = "", datafolder "/data/pgxc/co2dn2/gtm_proxy1" > LOCATION: ProcessPGXCNodeRegister, register_gtm.c:99 > 1:140678569920256:2012-07-15 17:31:38.645 JST -LOG: Node type = 1 > LOCATION: ProcessPGXCNodeRegister, register_gtm.c:116 > 1:140678569920256:2012-07-15 17:31:38.646 JST -LOG: > Recovery_PGXCNodeRegister Request info: type=1, nodename=gtm_proxy1, > port=6666,datafolder=/data/pgxc/co2dn2/gtm_proxy1, ipaddress=localhost, > status=0 > LOCATION: Recovery_PGXCNodeRegister, register_common.c:397 > ........ > 1:140678538450688:2012-07-15 17:31:55.063 JST -LOG: Node type = 1 > LOCATION: ProcessPGXCNodeRegister, register_gtm.c:116 > 1:140678538450688:2012-07-15 17:31:55.063 JST -LOG: > Recovery_PGXCNodeRegister Request info: type=1, nodename=gtm_proxy2, > port=6666,datafolder=/data/pgxc/co2dn2/gtm_proxy2, ipaddress=localhost, > status=0 > LOCATION: Recovery_PGXCNodeRegister, register_common.c:397 > 1:140678538450688:2012-07-15 17:31:55.063 JST -LOG: > Recovery_PGXCNodeRegister Node info: type=1, nodename=gtm_proxy2, > port=6666, datafolder=/data/pgxc/co2dn2/gtm_proxy2, ipaddress=localhost, > status=0 > LOCATION: Recovery_PGXCNodeRegister, register_common.c:400 > 1:140678569928448:2012-07-15 17:31:55.080 JST -LOG: Any GTM standby node > not found in registered node(s). > ........ > 1:140678548940544:2012-07-15 17:36:57.701 JST -ERROR: Failed to get a > snapshot > LOCATION: ProcessGetSnapshotCommandMulti, gtm_snap.c:419 > 1:140678559430400:2012-07-15 17:36:57.702 JST -LOG: Sending transaction > ids from 20734 to 20735 > LOCATION: ProcessBeginTransactionGetGXIDCommandMulti, gtm_txn.c:1463 > 1:140678559430400:2012-07-15 17:36:57.720 JST -LOG: Committing: prepared > id 20721 and commit prepared id 20733 > LOCATION: ProcessCommitPreparedTransactionCommand, gtm_txn.c:1697 > 1:140678548940544:2012-07-15 17:36:57.722 JST -WARNING: Invalid > transaction handle: -1 > LOCATION: GTM_HandleToTransactionInfo, gtm_txn.c:206 > 1:140678548940544:2012-07-15 17:36:57.723 JST -ERROR: Failed to get a > snapshot > LOCATION: ProcessGetSnapshotCommandMulti, gtm_snap.c:419 > 1:140678559430400:2012-07-15 17:36:57.755 JST -LOG: Committing: prepared > id 20715 and commit prepared id 20734 > LOCATION: ProcessCommitPreparedTransactionCommand, gtm_txn.c:1697 > 1:140678548940544:2012-07-15 17:36:57.755 JST -LOG: Sending transaction > ids from 20735 to 20736 > LOCATION: ProcessBeginTransactionGetGXIDCommandMulti, gtm_txn.c:1463 > 1:140678548940544:2012-07-15 17:36:57.771 JST -WARNING: Invalid > transaction handle: -1 > LOCATION: GTM_HandleToTransactionInfo, gtm_txn.c:206 > 1:140678548940544:2012-07-15 17:36:57.771 JST -ERROR: Failed to get a > snapshot > LOCATION: ProcessGetSnapshotCommandMulti, gtm_snap.c:419 > 1:140678559430400:2012-07-15 17:36:57.772 JST -LOG: Sending transaction > ids from 20736 to 20737 > > [gtm_proxy1's log] > LOCATION: GTMProxy_SigleHandler, proxy_main.c:406 > 1:139748173334272:2012-07-15 17:31:37.788 JST -LOG: Starting GTM proxy at > (localhost:6666) > LOCATION: main, proxy_main.c:838 > 1:139748026468096:2012-07-15 17:36:56.718 JST -ERROR2: snapshot request > failed > It looks like you got a problem here. Has anybody seen such errors with GTM-proxy? > LOCATION: ProcessResponse, proxy_main.c:1952 > 1:139748026468096:2012-07-15 17:36:56.914 JST -ERROR: Wrong result > LOCATION: ProcessResponse, proxy_main.c:1921 > 1:139748171175680:2012-07-15 17:36:57.004 JST -ERROR2: Transaction commit > failed > LOCATION: ProcessResponse, proxy_main.c:1910 > 1:139748171175680:2012-07-15 17:37:17.115 JST -ERROR2: Transaction commit > failed > LOCATION: ProcessResponse, proxy_main.c:1910 > > [gtm_proxy2s log] > LOCATION: GTMProxy_SigleHandler, proxy_main.c:406 > 1:140023358203648:2012-07-15 17:31:55.485 JST -LOG: Starting GTM proxy at > (localhost:6666) > > [Postgres-XC Version] > postgtres-xc-postgres-xc-XC1_0_0_PG9_1-27-gd549c2f.zip > looks like more or less the latest code here. -- Michael Paquier http://michael.otacoo.com |
From: Michael P. <mic...@gm...> - 2012-07-16 05:29:49
|
This merge idea looks nice. But I see 2 issues. 1) What is cruelly missing in the current design is the possibility to run a Postgres-XC node as a GTM-Proxy standalone. This means that this Postgres-XC node would only proxy messages to GTM and only that. It will need to reject other client applications. This would allow cascading of GTM Proxies and more flexibility in the system. If you guys are able to add that, well we gain in code maintenance and keep the same architecture possible. 2) Performance? We will need to test that under really a heavy load before committing it. -- Michael Paquier http://michael.otacoo.com |
From: <hir...@hi...> - 2012-07-16 05:18:52
|
Hi. Develoers. In the environment using gtm_proxy, in order that error may come out, please let me know the vital point of correction. (when not using gtm_proxy ( = When accessing direct to gtm ), error does not come out.) [ server's composition] Node1(oss240g1): coord1, datanode1, gtm_proxy1 Node2(oss240g2): coord2, datanode2, gtm_proxy2 Node3(oss240g3): gtm and Operation of pgbench [oss240g3 pgbench operation] [pgxc@oss240g3 co2dn2]$ pgbench -n -c 13 test1 -h oss240g1 Client 2 aborted in state 12: ERROR: GTM error, could not obtain snapshot Client 3 aborted in state 10: ERROR: GTM error, could not obtain snapshot Client 9 aborted in state 11: ERROR: GTM error, could not obtain snapshot transaction type: TPC-B (sort of) scaling factor: 10 query mode: simple number of clients: 13 number of threads: 1 number of transactions per client: 10 number of transactions actually processed: 126/130 tps = .... [gtm's log] 1:140678569928448:2012-07-15 17:31:38.641 JST -LOG: Any GTM standby node not found in registered node(s). LOCATION: gtm_standby_connect_to_standby_int, gtm_standby.c:376 1:140678569920256:2012-07-15 17:31:38.645 JST -LOG: ProcessPGXCNodeRegister: ipaddress = "localhost", node name = "gtm_proxy1", proxy name = "", datafolder "/data/pgxc/co2dn2/gtm_proxy1" LOCATION: ProcessPGXCNodeRegister, register_gtm.c:99 1:140678569920256:2012-07-15 17:31:38.645 JST -LOG: Node type = 1 LOCATION: ProcessPGXCNodeRegister, register_gtm.c:116 1:140678569920256:2012-07-15 17:31:38.646 JST -LOG: Recovery_PGXCNodeRegister Request info: type=1, nodename=gtm_proxy1, port=6666,datafolder=/data/pgxc/co2dn2/gtm_proxy1, ipaddress=localhost, status=0 LOCATION: Recovery_PGXCNodeRegister, register_common.c:397 ........ 1:140678538450688:2012-07-15 17:31:55.063 JST -LOG: Node type = 1 LOCATION: ProcessPGXCNodeRegister, register_gtm.c:116 1:140678538450688:2012-07-15 17:31:55.063 JST -LOG: Recovery_PGXCNodeRegister Request info: type=1, nodename=gtm_proxy2, port=6666,datafolder=/data/pgxc/co2dn2/gtm_proxy2, ipaddress=localhost, status=0 LOCATION: Recovery_PGXCNodeRegister, register_common.c:397 1:140678538450688:2012-07-15 17:31:55.063 JST -LOG: Recovery_PGXCNodeRegister Node info: type=1, nodename=gtm_proxy2, port=6666, datafolder=/data/pgxc/co2dn2/gtm_proxy2, ipaddress=localhost, status=0 LOCATION: Recovery_PGXCNodeRegister, register_common.c:400 1:140678569928448:2012-07-15 17:31:55.080 JST -LOG: Any GTM standby node not found in registered node(s). ........ 1:140678548940544:2012-07-15 17:36:57.701 JST -ERROR: Failed to get a snapshot LOCATION: ProcessGetSnapshotCommandMulti, gtm_snap.c:419 1:140678559430400:2012-07-15 17:36:57.702 JST -LOG: Sending transaction ids from 20734 to 20735 LOCATION: ProcessBeginTransactionGetGXIDCommandMulti, gtm_txn.c:1463 1:140678559430400:2012-07-15 17:36:57.720 JST -LOG: Committing: prepared id 20721 and commit prepared id 20733 LOCATION: ProcessCommitPreparedTransactionCommand, gtm_txn.c:1697 1:140678548940544:2012-07-15 17:36:57.722 JST -WARNING: Invalid transaction handle: -1 LOCATION: GTM_HandleToTransactionInfo, gtm_txn.c:206 1:140678548940544:2012-07-15 17:36:57.723 JST -ERROR: Failed to get a snapshot LOCATION: ProcessGetSnapshotCommandMulti, gtm_snap.c:419 1:140678559430400:2012-07-15 17:36:57.755 JST -LOG: Committing: prepared id 20715 and commit prepared id 20734 LOCATION: ProcessCommitPreparedTransactionCommand, gtm_txn.c:1697 1:140678548940544:2012-07-15 17:36:57.755 JST -LOG: Sending transaction ids from 20735 to 20736 LOCATION: ProcessBeginTransactionGetGXIDCommandMulti, gtm_txn.c:1463 1:140678548940544:2012-07-15 17:36:57.771 JST -WARNING: Invalid transaction handle: -1 LOCATION: GTM_HandleToTransactionInfo, gtm_txn.c:206 1:140678548940544:2012-07-15 17:36:57.771 JST -ERROR: Failed to get a snapshot LOCATION: ProcessGetSnapshotCommandMulti, gtm_snap.c:419 1:140678559430400:2012-07-15 17:36:57.772 JST -LOG: Sending transaction ids from 20736 to 20737 [gtm_proxy1's log] LOCATION: GTMProxy_SigleHandler, proxy_main.c:406 1:139748173334272:2012-07-15 17:31:37.788 JST -LOG: Starting GTM proxy at (localhost:6666) LOCATION: main, proxy_main.c:838 1:139748026468096:2012-07-15 17:36:56.718 JST -ERROR2: snapshot request failed LOCATION: ProcessResponse, proxy_main.c:1952 1:139748026468096:2012-07-15 17:36:56.914 JST -ERROR: Wrong result LOCATION: ProcessResponse, proxy_main.c:1921 1:139748171175680:2012-07-15 17:36:57.004 JST -ERROR2: Transaction commit failed LOCATION: ProcessResponse, proxy_main.c:1910 1:139748171175680:2012-07-15 17:37:17.115 JST -ERROR2: Transaction commit failed LOCATION: ProcessResponse, proxy_main.c:1910 [gtm_proxy2s log] LOCATION: GTMProxy_SigleHandler, proxy_main.c:406 1:140023358203648:2012-07-15 17:31:55.485 JST -LOG: Starting GTM proxy at (localhost:6666) [Postgres-XC Version] postgtres-xc-postgres-xc-XC1_0_0_PG9_1-27-gd549c2f.zip As mentioned above, I need your help well. ---------- Hiroshi Kise Hitachi, Ltd., IT Platform Division 5030, Totsuka-cyo, Totsuka-ku, Yokohama, 244-8555 Japan |
From: Koichi S. <koi...@gm...> - 2012-07-16 04:34:16
|
This is another reason gtm proxy merge can be done when coordinator/datanod are integrated into single node. Anyway, it has not been decided yet and would need more discussion. Regards; ---------- Koichi Suzuki 2012/7/16 Ashutosh Bapat <ash...@en...>: > Does that mean, that every coordinator, datanode runs a GTM-proxy? > > In typical XC configuration, we can have datanode and coordinator running on > same machine/server. For that matter we can have as many nodes running on a > server as one wants. I think, a GTM-proxy can be shared among them. > > On Sat, Jul 14, 2012 at 9:14 AM, Andrei Martsinchyk > <and...@gm...> wrote: >> >> We have an idea to merge GTM Proxy into the XC core and make it an >> auxiliary process, like pooler, background writer, logger, etc. >> Like any other auxiliary process it would be started with the node, >> stopped with the node and serve sessions running under the same postmaster. >> The session would be always communicate to GTM via internal proxy. >> Configuration parameters specific to GTM proxy will be moved to the main >> config file, some of them like GTM host/port are already there, they would >> be used by the Proxy, session would be connected to the port where it is >> listening, it would be in the same config file. >> We see following benefits of doing this: >> - We force people to connect to GTM via proxy and achieve better >> performance. >> - Better maintainability: -1 configuration file, -1 log file, -1 entity to >> manage and monitor. The proxy would share main configuration log, main log >> file, and postmaster would monitor proxy and restart if it fails, the >> housekeeping code is already here. >> - Effectiveness: having proxy running under the same postmaster as its >> clients would allow to use faster communication channels, like unix sockets, >> pipes, shared memory. >> - Reliability: if main GTM server goes down GTM proxy able to reconnect to >> a promoted standby, but not session. If session would always connect to the >> proxy it always would be able to reconnect. >> The only drawback we see is that the multiple nodes would not be able to >> share the same proxy. >> So, any ideas? Is it worth doing? Any pros and cons we are missing? >> >> -- >> Andrei Martsinchyk >> >> StormDB - http://www.stormdb.com >> The Database Cloud >> >> >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Postgres-xc-developers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > |
From: Ashutosh B. <ash...@en...> - 2012-07-16 04:26:09
|
Does that mean, that every coordinator, datanode runs a GTM-proxy? In typical XC configuration, we can have datanode and coordinator running on same machine/server. For that matter we can have as many nodes running on a server as one wants. I think, a GTM-proxy can be shared among them. On Sat, Jul 14, 2012 at 9:14 AM, Andrei Martsinchyk < and...@gm...> wrote: > We have an idea to merge GTM Proxy into the XC core and make it an > auxiliary process, like pooler, background writer, logger, etc. > Like any other auxiliary process it would be started with the node, > stopped with the node and serve sessions running under the same postmaster. > The session would be always communicate to GTM via internal proxy. > Configuration parameters specific to GTM proxy will be moved to the main > config file, some of them like GTM host/port are already there, they would > be used by the Proxy, session would be connected to the port where it is > listening, it would be in the same config file. > We see following benefits of doing this: > - We force people to connect to GTM via proxy and achieve better > performance. > - Better maintainability: -1 configuration file, -1 log file, -1 entity to > manage and monitor. The proxy would share main configuration log, main log > file, and postmaster would monitor proxy and restart if it fails, the > housekeeping code is already here. > - Effectiveness: having proxy running under the same postmaster as its > clients would allow to use faster communication channels, like unix > sockets, pipes, shared memory. > - Reliability: if main GTM server goes down GTM proxy able to reconnect to > a promoted standby, but not session. If session would always connect to the > proxy it always would be able to reconnect. > The only drawback we see is that the multiple nodes would not be able to > share the same proxy. > So, any ideas? Is it worth doing? Any pros and cons we are missing? > > -- > Andrei Martsinchyk > > StormDB - http://www.stormdb.com > The Database Cloud > > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Koichi S. <koi...@gm...> - 2012-07-16 03:07:48
|
As we have another idea to merge coordinator and datanode for more performance gain, I think we could do this merge together. Current gtm proxy code is based upon gtm and is based upon multi-thread, it does not fit to coordinator/datanode backend. Maybe we need drastic rewrite to gtm-proxy at the merge. Regards; ---------- Koichi Suzuki 2012/7/14 Andrei Martsinchyk <and...@gm...>: > We have an idea to merge GTM Proxy into the XC core and make it an auxiliary > process, like pooler, background writer, logger, etc. > Like any other auxiliary process it would be started with the node, stopped > with the node and serve sessions running under the same postmaster. The > session would be always communicate to GTM via internal proxy. Configuration > parameters specific to GTM proxy will be moved to the main config file, some > of them like GTM host/port are already there, they would be used by the > Proxy, session would be connected to the port where it is listening, it > would be in the same config file. > We see following benefits of doing this: > - We force people to connect to GTM via proxy and achieve better > performance. > - Better maintainability: -1 configuration file, -1 log file, -1 entity to > manage and monitor. The proxy would share main configuration log, main log > file, and postmaster would monitor proxy and restart if it fails, the > housekeeping code is already here. > - Effectiveness: having proxy running under the same postmaster as its > clients would allow to use faster communication channels, like unix sockets, > pipes, shared memory. > - Reliability: if main GTM server goes down GTM proxy able to reconnect to a > promoted standby, but not session. If session would always connect to the > proxy it always would be able to reconnect. > The only drawback we see is that the multiple nodes would not be able to > share the same proxy. > So, any ideas? Is it worth doing? Any pros and cons we are missing? > > -- > Andrei Martsinchyk > > StormDB - http://www.stormdb.com > The Database Cloud > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > |
From: Koichi S. <koi...@gm...> - 2012-07-16 02:48:04
|
+1 for Michael's idea. Because tables are created by CREATE TABLE and dropped by DROP TABLE, it is quite natural to change the distribution with ALTER TABLE. It is very important to maintain SQL interface. Current Michael's implementation requires involved tables locked, it means that applications are blocked. As found in the feature request, we need redistribution to be done in backend. This was discussed in the cluster summit in PGCon 2012. This does not require to block application while redistribution are being done and will be nice for some cluster operation scene. Operation with tables locked is simpler and faster so I think we have use cases for both. Because the feature is the same, it is natural to provide these (foreground and background) redistribution in the same manner. It is also very very difficult to provide background redistribution without the core extension. So I think it is quite reasonable decision to provide them as ALTER TABLE. Background ALTER TABLE could be provided as "ALTER TABLE .... CONCURRENTLY" as done in CREATE INDEX. Regards; ---------- Koichi Suzuki 2012/7/14 Michael Paquier <mic...@gm...>: > > > On Sat, Jul 14, 2012 at 1:01 AM, Nikhil Sontakke <ni...@st...> > wrote: >> >> > But, one thing that is essential is catalog updates. Are you suggesting >> > that >> > the catalog updates too should be done using some SQL? >> >> Surely, we can do: >> >> BEGIN; >> >> CREATE NODE; /* Hmm, I think we should have a CREATE CLUSTER NODE >> version*/ >> >> ALL the redistribution SQL here; >> >> COMMIT; >> >> The new node info will be visible to the rest of the SQL commands that >> follow this ways. >> >> Michael, I don't have a strong view against adding stuff in the core, >> but since the first cut for redistribution seems to be a bunch of SQL >> grouped together, I thought this might be worth investigating too. > > Honestly I thought about such solutions before beginning my stuff, but > arrived at the conclusion that we need a native way to manage > redistribution. In this case, you fall under the following limitations: > - How to manage node-level granularity for redistribution operations? In XC, > by design, a SQL is launched globally on the cluster. At the exception of > EXECUTE DIRECT but it only allows SELECT commands. > - When and how to manage the catalog update? Even if it is possible to > update catalog with DMLs, we need native APIs to be able to modify catalog > entries. > - CREATE NODE is used for the addition and deletion of nodes. Data > redistribution does not concern changing the configuration of the cluster. > You got a certain number of Datanodes, Coordinators, and you want to change > the nodes where data of a table is located inside this given cluster. Your > approach makes redistribution dependent on cluster configuration and it is > honestly not welcome to add such degrees of dependencies that may be a > burden in the future if we change once again the way cluster is configured. > - For certain redistribution operations, we do NOT always need a transaction > block! Just take the example of a replicated table changed to a distributed > table. You just need to send a DELETE query to remote nodes to remove only > the tuples that do not satisfy a hash condition. This is one of those things > I am working on now. > > To my mind, each argument here makes necessary this feature in core. All > combined even strengthen my arguments. > >> >> Regards, >> Nikhils >> >> > Something I am afraid is not possible with an external utility is >> > control of >> > redistribution at node level. For example, an external contrib module or >> > utility will launch SQL queries to xc that have to be treated as global. >> > However, redistribution needs to take care of cases like for example the >> > reduction of nodes for replicated tables. In this case you just need to >> > delete the data from removed nodes. Another easy example is the case of >> > an >> > increase of nodes for replicated tables. You need to pick up data on >> > coordinator and then send it only to the new nodes. Those simple >> > examples >> > need a core management to minimize the work of redistribution inside >> > cluster. >> > >> > On 2012/07/13, at 15:09, Ashutosh Bapat >> > <ash...@en...> >> > wrote: >> > >> > Even, I am wondering if that would be better. >> > >> > But, one thing that is essential is catalog updates. Are you suggesting >> > that >> > the catalog updates too should be done using some SQL? >> > >> > Can you please expand more on your idea, may be providing some examples, >> > pseudo-code etc.? >> > >> > On Fri, Jul 13, 2012 at 10:36 AM, Nikhil Sontakke <ni...@st...> >> > wrote: >> >> >> >> Just a thought. >> >> >> >> If we have a utility which spews out all of these statements to >> >> redistribute a table across node modifications, then we can just wrap >> >> them inside a transaction block and just run that? >> >> >> >> Wont it save all of the core changes? >> >> >> >> Regards, >> >> Nikhils >> >> >> >> On Fri, Jul 13, 2012 at 12:29 AM, Michael Paquier >> >> <mic...@gm...> wrote: >> >> > Hi all, >> >> > >> >> > Please find attached an updated patch adding redistribution >> >> > optimizations >> >> > for replicated tables. >> >> > If the node subset of a replicated table is reduced, the necessary >> >> > nodes >> >> > are >> >> > simply truncated. >> >> > If it is increased, a COPY TO is done to fetch the data, and COPY >> >> > FROM >> >> > is >> >> > done only on the necessary nodes. >> >> > New regression tests have been added to test that. >> >> > >> >> > Regards, >> >> > >> >> > >> >> > On Thu, Jul 12, 2012 at 5:30 PM, Michael Paquier >> >> > <mic...@gm...> >> >> > wrote: >> >> >> >> >> >> OK, here is the mammoth patch: 3000 lines including docs, >> >> >> implementation >> >> >> and regressions. >> >> >> The code has been realigned with current master. >> >> >> This patch introduces the latest thing I am working on: the >> >> >> redistribution >> >> >> command tree planning and execution. >> >> >> >> >> >> As I explained before, a redistribution consists of a series of >> >> >> commands >> >> >> (TRUNCATE, REINDEX, DELETE, COPY FROM, COPY TO) that need to be >> >> >> determined >> >> >> depending on the new and old locator information of the relation. >> >> >> Each >> >> >> action can be done on a subset of nodes. >> >> >> This patch introduces the basic infrastructure of the command tree >> >> >> build >> >> >> and execution. >> >> >> For the time being, redistribution uses only what is called the >> >> >> default >> >> >> command tree consisting of: >> >> >> 1) COPY TO >> >> >> 2) TRUNCATE >> >> >> 3) COPY FROM >> >> >> 4) REINDEX >> >> >> But this structure can be easily completed with more complicated >> >> >> operations. >> >> >> In this patch there is still a small thing missing which is the >> >> >> possibility to launch a COPY FROM on a subset of nodes, particularly >> >> >> useful >> >> >> when redistribution consists of a replicated table whose set of >> >> >> nodes >> >> >> is >> >> >> increased. >> >> >> Compared to the last versions, the impact of redistribution in >> >> >> tablecmds.c >> >> >> is limited. >> >> >> >> >> >> Regards, >> >> >> >> >> >> -- >> >> >> Michael Paquier >> >> >> http://michael.otacoo.com >> >> > >> >> > >> >> > >> >> > >> >> > -- >> >> > Michael Paquier >> >> > http://michael.otacoo.com >> >> > >> >> > >> >> > >> >> > ------------------------------------------------------------------------------ >> >> > Live Security Virtual Conference >> >> > Exclusive live event will cover all the ways today's security and >> >> > threat landscape has changed and how IT managers can respond. >> >> > Discussions >> >> > will include endpoint security, mobile security and the latest in >> >> > malware >> >> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> >> > _______________________________________________ >> >> > Postgres-xc-developers mailing list >> >> > Pos...@li... >> >> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >> > >> >> >> >> >> >> >> >> -- >> >> StormDB - http://www.stormdb.com >> >> The Database Cloud >> > >> > >> > >> > >> > -- >> > Best Wishes, >> > Ashutosh Bapat >> > EntepriseDB Corporation >> > The Enterprise Postgres Company >> > >> >> >> >> -- >> StormDB - http://www.stormdb.com >> The Database Cloud > > > > > -- > Michael Paquier > http://michael.otacoo.com > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > |
From: Andrei M. <and...@gm...> - 2012-07-14 03:44:52
|
We have an idea to merge GTM Proxy into the XC core and make it an auxiliary process, like pooler, background writer, logger, etc. Like any other auxiliary process it would be started with the node, stopped with the node and serve sessions running under the same postmaster. The session would be always communicate to GTM via internal proxy. Configuration parameters specific to GTM proxy will be moved to the main config file, some of them like GTM host/port are already there, they would be used by the Proxy, session would be connected to the port where it is listening, it would be in the same config file. We see following benefits of doing this: - We force people to connect to GTM via proxy and achieve better performance. - Better maintainability: -1 configuration file, -1 log file, -1 entity to manage and monitor. The proxy would share main configuration log, main log file, and postmaster would monitor proxy and restart if it fails, the housekeeping code is already here. - Effectiveness: having proxy running under the same postmaster as its clients would allow to use faster communication channels, like unix sockets, pipes, shared memory. - Reliability: if main GTM server goes down GTM proxy able to reconnect to a promoted standby, but not session. If session would always connect to the proxy it always would be able to reconnect. The only drawback we see is that the multiple nodes would not be able to share the same proxy. So, any ideas? Is it worth doing? Any pros and cons we are missing? -- Andrei Martsinchyk StormDB - http://www.stormdb.com The Database Cloud |
From: Michael P. <mic...@gm...> - 2012-07-13 23:54:28
|
On Sat, Jul 14, 2012 at 1:01 AM, Nikhil Sontakke <ni...@st...>wrote: > > But, one thing that is essential is catalog updates. Are you suggesting > that > > the catalog updates too should be done using some SQL? > > Surely, we can do: > > BEGIN; > > CREATE NODE; /* Hmm, I think we should have a CREATE CLUSTER NODE version*/ > > ALL the redistribution SQL here; > > COMMIT; > > The new node info will be visible to the rest of the SQL commands that > follow this ways. > > Michael, I don't have a strong view against adding stuff in the core, > but since the first cut for redistribution seems to be a bunch of SQL > grouped together, I thought this might be worth investigating too. > Honestly I thought about such solutions before beginning my stuff, but arrived at the conclusion that we need a native way to manage redistribution. In this case, you fall under the following limitations: - How to manage node-level granularity for redistribution operations? In XC, by design, a SQL is launched globally on the cluster. At the exception of EXECUTE DIRECT but it only allows SELECT commands. - When and how to manage the catalog update? Even if it is possible to update catalog with DMLs, we need native APIs to be able to modify catalog entries. - CREATE NODE is used for the addition and deletion of nodes. Data redistribution does not concern changing the configuration of the cluster. You got a certain number of Datanodes, Coordinators, and you want to change the nodes where data of a table is located inside this given cluster. Your approach makes redistribution dependent on cluster configuration and it is honestly not welcome to add such degrees of dependencies that may be a burden in the future if we change once again the way cluster is configured. - For certain redistribution operations, we do NOT always need a transaction block! Just take the example of a replicated table changed to a distributed table. You just need to send a DELETE query to remote nodes to remove only the tuples that do not satisfy a hash condition. This is one of those things I am working on now. To my mind, each argument here makes necessary this feature in core. All combined even strengthen my arguments. > Regards, > Nikhils > > > Something I am afraid is not possible with an external utility is > control of > > redistribution at node level. For example, an external contrib module or > > utility will launch SQL queries to xc that have to be treated as global. > > However, redistribution needs to take care of cases like for example the > > reduction of nodes for replicated tables. In this case you just need to > > delete the data from removed nodes. Another easy example is the case of > an > > increase of nodes for replicated tables. You need to pick up data on > > coordinator and then send it only to the new nodes. Those simple examples > > need a core management to minimize the work of redistribution inside > > cluster. > > > > On 2012/07/13, at 15:09, Ashutosh Bapat <ash...@en... > > > > wrote: > > > > Even, I am wondering if that would be better. > > > > But, one thing that is essential is catalog updates. Are you suggesting > that > > the catalog updates too should be done using some SQL? > > > > Can you please expand more on your idea, may be providing some examples, > > pseudo-code etc.? > > > > On Fri, Jul 13, 2012 at 10:36 AM, Nikhil Sontakke <ni...@st...> > > wrote: > >> > >> Just a thought. > >> > >> If we have a utility which spews out all of these statements to > >> redistribute a table across node modifications, then we can just wrap > >> them inside a transaction block and just run that? > >> > >> Wont it save all of the core changes? > >> > >> Regards, > >> Nikhils > >> > >> On Fri, Jul 13, 2012 at 12:29 AM, Michael Paquier > >> <mic...@gm...> wrote: > >> > Hi all, > >> > > >> > Please find attached an updated patch adding redistribution > >> > optimizations > >> > for replicated tables. > >> > If the node subset of a replicated table is reduced, the necessary > nodes > >> > are > >> > simply truncated. > >> > If it is increased, a COPY TO is done to fetch the data, and COPY FROM > >> > is > >> > done only on the necessary nodes. > >> > New regression tests have been added to test that. > >> > > >> > Regards, > >> > > >> > > >> > On Thu, Jul 12, 2012 at 5:30 PM, Michael Paquier > >> > <mic...@gm...> > >> > wrote: > >> >> > >> >> OK, here is the mammoth patch: 3000 lines including docs, > >> >> implementation > >> >> and regressions. > >> >> The code has been realigned with current master. > >> >> This patch introduces the latest thing I am working on: the > >> >> redistribution > >> >> command tree planning and execution. > >> >> > >> >> As I explained before, a redistribution consists of a series of > >> >> commands > >> >> (TRUNCATE, REINDEX, DELETE, COPY FROM, COPY TO) that need to be > >> >> determined > >> >> depending on the new and old locator information of the relation. > Each > >> >> action can be done on a subset of nodes. > >> >> This patch introduces the basic infrastructure of the command tree > >> >> build > >> >> and execution. > >> >> For the time being, redistribution uses only what is called the > default > >> >> command tree consisting of: > >> >> 1) COPY TO > >> >> 2) TRUNCATE > >> >> 3) COPY FROM > >> >> 4) REINDEX > >> >> But this structure can be easily completed with more complicated > >> >> operations. > >> >> In this patch there is still a small thing missing which is the > >> >> possibility to launch a COPY FROM on a subset of nodes, particularly > >> >> useful > >> >> when redistribution consists of a replicated table whose set of nodes > >> >> is > >> >> increased. > >> >> Compared to the last versions, the impact of redistribution in > >> >> tablecmds.c > >> >> is limited. > >> >> > >> >> Regards, > >> >> > >> >> -- > >> >> Michael Paquier > >> >> http://michael.otacoo.com > >> > > >> > > >> > > >> > > >> > -- > >> > Michael Paquier > >> > http://michael.otacoo.com > >> > > >> > > >> > > ------------------------------------------------------------------------------ > >> > Live Security Virtual Conference > >> > Exclusive live event will cover all the ways today's security and > >> > threat landscape has changed and how IT managers can respond. > >> > Discussions > >> > will include endpoint security, mobile security and the latest in > >> > malware > >> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > >> > _______________________________________________ > >> > Postgres-xc-developers mailing list > >> > Pos...@li... > >> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > >> > > >> > >> > >> > >> -- > >> StormDB - http://www.stormdb.com > >> The Database Cloud > > > > > > > > > > -- > > Best Wishes, > > Ashutosh Bapat > > EntepriseDB Corporation > > The Enterprise Postgres Company > > > > > > -- > StormDB - http://www.stormdb.com > The Database Cloud > -- Michael Paquier http://michael.otacoo.com |
From: Nikhil S. <ni...@st...> - 2012-07-13 16:01:52
|
> But, one thing that is essential is catalog updates. Are you suggesting that > the catalog updates too should be done using some SQL? Surely, we can do: BEGIN; CREATE NODE; /* Hmm, I think we should have a CREATE CLUSTER NODE version*/ ALL the redistribution SQL here; COMMIT; The new node info will be visible to the rest of the SQL commands that follow this ways. Michael, I don't have a strong view against adding stuff in the core, but since the first cut for redistribution seems to be a bunch of SQL grouped together, I thought this might be worth investigating too. Regards, Nikhils > Something I am afraid is not possible with an external utility is control of > redistribution at node level. For example, an external contrib module or > utility will launch SQL queries to xc that have to be treated as global. > However, redistribution needs to take care of cases like for example the > reduction of nodes for replicated tables. In this case you just need to > delete the data from removed nodes. Another easy example is the case of an > increase of nodes for replicated tables. You need to pick up data on > coordinator and then send it only to the new nodes. Those simple examples > need a core management to minimize the work of redistribution inside > cluster. > > On 2012/07/13, at 15:09, Ashutosh Bapat <ash...@en...> > wrote: > > Even, I am wondering if that would be better. > > But, one thing that is essential is catalog updates. Are you suggesting that > the catalog updates too should be done using some SQL? > > Can you please expand more on your idea, may be providing some examples, > pseudo-code etc.? > > On Fri, Jul 13, 2012 at 10:36 AM, Nikhil Sontakke <ni...@st...> > wrote: >> >> Just a thought. >> >> If we have a utility which spews out all of these statements to >> redistribute a table across node modifications, then we can just wrap >> them inside a transaction block and just run that? >> >> Wont it save all of the core changes? >> >> Regards, >> Nikhils >> >> On Fri, Jul 13, 2012 at 12:29 AM, Michael Paquier >> <mic...@gm...> wrote: >> > Hi all, >> > >> > Please find attached an updated patch adding redistribution >> > optimizations >> > for replicated tables. >> > If the node subset of a replicated table is reduced, the necessary nodes >> > are >> > simply truncated. >> > If it is increased, a COPY TO is done to fetch the data, and COPY FROM >> > is >> > done only on the necessary nodes. >> > New regression tests have been added to test that. >> > >> > Regards, >> > >> > >> > On Thu, Jul 12, 2012 at 5:30 PM, Michael Paquier >> > <mic...@gm...> >> > wrote: >> >> >> >> OK, here is the mammoth patch: 3000 lines including docs, >> >> implementation >> >> and regressions. >> >> The code has been realigned with current master. >> >> This patch introduces the latest thing I am working on: the >> >> redistribution >> >> command tree planning and execution. >> >> >> >> As I explained before, a redistribution consists of a series of >> >> commands >> >> (TRUNCATE, REINDEX, DELETE, COPY FROM, COPY TO) that need to be >> >> determined >> >> depending on the new and old locator information of the relation. Each >> >> action can be done on a subset of nodes. >> >> This patch introduces the basic infrastructure of the command tree >> >> build >> >> and execution. >> >> For the time being, redistribution uses only what is called the default >> >> command tree consisting of: >> >> 1) COPY TO >> >> 2) TRUNCATE >> >> 3) COPY FROM >> >> 4) REINDEX >> >> But this structure can be easily completed with more complicated >> >> operations. >> >> In this patch there is still a small thing missing which is the >> >> possibility to launch a COPY FROM on a subset of nodes, particularly >> >> useful >> >> when redistribution consists of a replicated table whose set of nodes >> >> is >> >> increased. >> >> Compared to the last versions, the impact of redistribution in >> >> tablecmds.c >> >> is limited. >> >> >> >> Regards, >> >> >> >> -- >> >> Michael Paquier >> >> http://michael.otacoo.com >> > >> > >> > >> > >> > -- >> > Michael Paquier >> > http://michael.otacoo.com >> > >> > >> > ------------------------------------------------------------------------------ >> > Live Security Virtual Conference >> > Exclusive live event will cover all the ways today's security and >> > threat landscape has changed and how IT managers can respond. >> > Discussions >> > will include endpoint security, mobile security and the latest in >> > malware >> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> > _______________________________________________ >> > Postgres-xc-developers mailing list >> > Pos...@li... >> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> > >> >> >> >> -- >> StormDB - http://www.stormdb.com >> The Database Cloud > > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > -- StormDB - http://www.stormdb.com The Database Cloud |
From: nop <no...@in...> - 2012-07-13 14:37:58
|
On Fri, Jul 13, 2012 at 12:50 AM, Michael Paquier <mic...@gm... > wrote: > > > On Fri, Jul 13, 2012 at 11:45 AM, nop <no...@in...> wrote: > >> Hi, >> >> Currently HEAD won't build because of a merge conflict. The attached >> patch resolves that. >> >> Cheers, >> Andrew >> > Thanks this has been fixed: > http://github.com/postgres-xc/postgres-xc/commit/5655bc5 > I didn't reused your patch, as you did not take into account the options > "-undefined suppress -flat_namespace" in LINK.shared that are used when > compiling code on MacOS. > Regards. > -- > Michael Paquier > http://michael.otacoo.com > Ah. OS X. Thanks for fixing it. |
From: Amit K. <ami...@en...> - 2012-07-13 11:57:45
|
Ashutosh/Michael, thanks for your comments. More comments inline. I have gone through each of the statements to see how we can tackle each one, and have uploaded the details here: https://docs.google.com/document/d/1KtPU8jrztHtGf_hzerCVacSoUPY36MK7aEpI6oqFQJc/edit To see if we can apply the approach, I have tried out some ad-hoc code-changes with sample statements namely create-tablespace, and alter-database-set-tablespace, and basic things seem to work. On 4 July 2012 09:51, Ashutosh Bapat <ash...@en...>wrote: > Amit, > If a complete automated solution is not possible, you can as well think > about reporting an error and expecting human intervention. > Yes, in many cases, the final resort is to report a message showing which particular node has failed. THis is after trying out a few things and then giving up. > On Tue, Jul 3, 2012 at 10:16 AM, Michael Paquier < > mic...@gm...> wrote: > >> >> On Fri, Jun 29, 2012 at 8:34 PM, Amit Khandekar < >> ami...@en...> wrote: >> >>> For utility statements in general, the coordinator propagates SQL >>> statements to all the required nodes, and most of these statements get run >>> on the datanodes inside a transaction block. So, when the statement fails >>> on at least one of the nodes, the statement gets rollbacked on all the >>> nodes due to the two-phase commit taking place, and therefore the cluster >>> rollbacks to a consistent state. But there are some statements which >>> cannot be run inside a transaction block. Here are some important ones: >>> CREATE/DROP DATABASE >>> CREATE/DROP TABLESPACE >>> ALTER DATABASE SET TABLESPACE >>> ALTER TYPE ADD ... (for enum types) >>> CREATE INDEX CONCURRENTLY >>> REINDEX DATABASE >>> DISCARD ALL >>> >>> So such statements run on datanodes in auto-commit mode, and so create >>> problems if they succeed on some nodes and abort on other nodes. For e.g. >>> : CREATE DATABASE. If a datanode d1 returns with error, and any other >>> datanode d2 has already returned back to coordinator with success, the >>> coordinator can't undo the commit of d2 because this is already committed. >>> Or if the coordinator itself crashes after datanodes commit but before the >>> coordinator commits, then again we have the same problem. The database >>> cannot be recreated from coordinator, since it is already created on some >>> of the other nodes. In such a cluster state, administrator needs to connect >>> to datanodes and do the needed cleanup. >>> >>> The committed statements can be followed by statements that undo the >>> operation, for e.g. DROP DATABASE for a CREATE DATABASE. But here again >>> this statement can fail for some reason. Also, typically for such >>> statements, their UNDO counterparts themselves cannot be run inside a >>> transaction block as well. So this is not a guaranteed way to bring back >>> the cluster to a consistent state. >>> >>> To find out how we can get around this issue, let's see why these >>> statements require to be run outside a transaction block in the first >>> place. There are two reasons why: >>> >>> 1. Typically such statements modify OS files and directories which >>> cannot be rollbacked. >>> >>> For DMLs, the rollback does not have to be explicitly undone. MVCC takes >>> care of it. But for OS file operations, there is no automatic way. So such >>> operations cannot be rollbacked. So in a transaction block, if a >>> create-database is followed by 10 other SQL statements before commit, and >>> one of the statements throws an error, ultimately the database won't be >>> created but there will be database files taking up disk space, and this has >>> happened just because the user has written the script wrongly. >>> >>> So by restricting such statement to be run outside a transaction block, >>> an unrelated error won't cause garbage files to be created. >>> >>> The statement itself does get committed eventually as usual. And it can >>> also get rolled back in the end. But maximum care has been taken in the >>> statement function (for e.g. createdb) such that the chances of an error >>> occurring *after* the files are created is least. For this, such a code >>> segment is inside PG_ENSURE_ERROR_CLEANUP() with some error_callback >>> function (createdb_failure_callback) which tries to clean up the files >>> created. >>> >>> So the end result is that this window between files-created and >>> error-occurred is minimized, not that such statements will never create >>> such cleanup issues if run outside transaction block. >>> >>> Possible solution: >>> >>> So regarding Postgres-XC, if we let such statements to be run inside >>> transaction block but only on remote nodes, what are the consequences? This >>> will of course prevent the issue of the statement committed on one node and >>> not the other. Also, the end user will still be prevented from running the >>> statement inside the transaction. Moreover, for such statement, say >>> create-database, the database will be created on all nodes or none, even if >>> one of the nodes return error. The only issue is, if the create-database is >>> aborted, it will leave disk space wasted on nodes where it has succeeded. >>> But this will be caused because of some configuration issues like disk >>> space, network down etc. The issue of other unrelated operations in the >>> same transaction causing rollback of create-database will not occur anyways >>> because we still don't allow it in a transaction block for the end-user. >>> >>> So the end result is we have solved the inconsistent cluster issue, >>> leaving some chances of disk cleanup issue, although not due to >>> user-queries getting aborted. So may be when such statements error out, we >>> display a notice that files need to be cleaned up. >>> >> Could it be possible to store somewhere in the PGDATA folder of the node >> involved the files that need to be cleaned up? We could use for this >> purpose some binary encoding or something. Ultimately this would finish >> just by being a list of files inside PGDATA to be cleaned up. >> We could then create a system function that unlinks all the files whose >> name have been stored on local node. As such a system function does not >> interact with other databases it could be immutable in order to allow a >> clean up from coordinator with EXECUTE DIRECT. >> > In an unfortnate even that some files remained to be cleaned up, I think it is better to not clean them up ourselved, because we are not sure whether we have aborted because of some half-cooked files, or these files were already there related to some other object. If we delete such pre-existing files, it will leave other objects corrupt. >> >>> We can go further ahead to reduce this window. We split the >>> create-database operation. We begin a transaction block, and then let >>> datanodes create the non-file operations first, like inserting pg_database >>> row, etc, by running them using a new function call. Don't commit it yet. >>> Then fire the last part: file system operations, this too using another >>> function call. And then finally commit. This file operation will be under >>> PG_ENSURE_ERROR_CLEANUP(). Due to synchronizing these individual tasks, we >>> reduce the window further. >>> >> We need to be careful here with the impact of our code on PostgreSQL >> code. It would be a pain to have a complecated implementation here for >> future merges. >> > Yes, as you can see in the doc I uploaded, I have come to the conclusion that this splitting is not very much required. The only function where we need to do this is movedb() (i.e. create-db-set-tbspace) where it has internal commits. >> >>> 2. Some statements do internal commits. >>> >>> For e.g. movedb() calls TransactionCommit() after copying the files, and >>> then removes the original files, so that if it crashes while removing the >>> files, the database with the new tablespace is already committed and >>> intact, so we just leave some old files. >>> >>> Such statements doing internal commits cannot be rolled back if run >>> inside transaction block, because they already do some commits. For such >>> statements, the above solution does not work. We need to find a separate >>> way for these specific statements. Few of such statements include: >>> ALTER DATABASE SET TABLESPACE >>> CLUSTER >>> CREATE INDEX CONCURRENTLY >>> >>> One similar solution is to split the individual tasks that get >>> internally committed using different functions for each task, and run the >>> individual functions on all the nodes synchronously. So the 2nd task does >>> not start until the first one gets committed on all the nodes. Whether it >>> is feasible to split the task is a question, and it depends on the >>> particular command. >>> >> We would need a locking system for each task and each task step like what >> is done for barrier. >> Or a new communication protocol, once again like barriers. Those are once >> again just ideas on the top of my mind. >> > Check the document where it explains how we can split the alter-db-set-tbspace commits. > >> >>> >>> As of now, I am not sure whether we can do some common changes in the >>> way transactions are implemented to find a common solution which does not >>> require changes for individual commands. But I will investigate more. >>> >> Thanks. >> -- >> Michael Paquier >> http://michael.otacoo.com >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Postgres-xc-developers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >> > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > > |
From: Ashutosh B. <ash...@en...> - 2012-07-13 10:14:51
|
Hi All, I am trying to run make check on XC and getting following error. [ashutosh@ubuntu regress]make check make -C ../../../src/port all make[1]: Entering directory `/work/xc_cat_update/coderoot/src/port' make -C ../backend submake-errcodes make[2]: Entering directory `/work/xc_cat_update/coderoot/src/backend' make[2]: Nothing to be done for `submake-errcodes'. make[2]: Leaving directory `/work/xc_cat_update/coderoot/src/backend' make[1]: Leaving directory `/work/xc_cat_update/coderoot/src/port' rm -rf ./testtablespace mkdir ./testtablespace ../../../src/test/regress/pg_regress --inputdir=. --temp-install=./tmp_check --top-builddir=../../.. --dlpath=. --schedule=./parallel_schedule ============== removing existing temp installation ============== ============== creating temporary installation ============== ============== initializing database system ============== ============== starting postmaster ============== ============== starting GTM process ============== running on port 57432, pooler port 57437 with PID 18593 for Coordinator 1 running on port 57433, pooler port 57438 with PID 18594 for Coordinator 2 running on port 57434 with PID 18604 for Datanode 1 running on port 57435 with PID 18614 for Datanode 2 running on port 57436 with PID 18592 for GTM ============== setting connection information ============== psql: could not connect to server: No such file or directory Is the server running locally and accepting connections on Unix domain socket "/tmp/.s.PGSQL.57432"? command failed: "/work/xc_cat_update/coderoot/src/test/regress/./tmp_check/install//work/xc_cat_update//Build/bin/psql" -X -p 57432 -c "CREATE NODE dn1 WITH (HOST = 'localhost', type = 'datanode', PORT = 57434);" "postgres" pg_ctl: PID file "/work/xc_cat_update/coderoot/src/test/regress/./tmp_check/data_co1/postmaster.pid" does not exist Is server running? pg_regress: could not stop postmaster: exit code was 256 make: *** [check] Error 2 Anybody has a clue, what's going wrong? -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Michael P. <mic...@gm...> - 2012-07-13 06:45:55
|
Something I am afraid is not possible with an external utility is control of redistribution at node level. For example, an external contrib module or utility will launch SQL queries to xc that have to be treated as global. However, redistribution needs to take care of cases like for example the reduction of nodes for replicated tables. In this case you just need to delete the data from removed nodes. Another easy example is the case of an increase of nodes for replicated tables. You need to pick up data on coordinator and then send it only to the new nodes. Those simple examples need a core management to minimize the work of redistribution inside cluster. On 2012/07/13, at 15:09, Ashutosh Bapat <ash...@en...> wrote: > Even, I am wondering if that would be better. > > But, one thing that is essential is catalog updates. Are you suggesting that the catalog updates too should be done using some SQL? > > Can you please expand more on your idea, may be providing some examples, pseudo-code etc.? > > On Fri, Jul 13, 2012 at 10:36 AM, Nikhil Sontakke <ni...@st...> wrote: > Just a thought. > > If we have a utility which spews out all of these statements to > redistribute a table across node modifications, then we can just wrap > them inside a transaction block and just run that? > > Wont it save all of the core changes? > > Regards, > Nikhils > > On Fri, Jul 13, 2012 at 12:29 AM, Michael Paquier > <mic...@gm...> wrote: > > Hi all, > > > > Please find attached an updated patch adding redistribution optimizations > > for replicated tables. > > If the node subset of a replicated table is reduced, the necessary nodes are > > simply truncated. > > If it is increased, a COPY TO is done to fetch the data, and COPY FROM is > > done only on the necessary nodes. > > New regression tests have been added to test that. > > > > Regards, > > > > > > On Thu, Jul 12, 2012 at 5:30 PM, Michael Paquier <mic...@gm...> > > wrote: > >> > >> OK, here is the mammoth patch: 3000 lines including docs, implementation > >> and regressions. > >> The code has been realigned with current master. > >> This patch introduces the latest thing I am working on: the redistribution > >> command tree planning and execution. > >> > >> As I explained before, a redistribution consists of a series of commands > >> (TRUNCATE, REINDEX, DELETE, COPY FROM, COPY TO) that need to be determined > >> depending on the new and old locator information of the relation. Each > >> action can be done on a subset of nodes. > >> This patch introduces the basic infrastructure of the command tree build > >> and execution. > >> For the time being, redistribution uses only what is called the default > >> command tree consisting of: > >> 1) COPY TO > >> 2) TRUNCATE > >> 3) COPY FROM > >> 4) REINDEX > >> But this structure can be easily completed with more complicated > >> operations. > >> In this patch there is still a small thing missing which is the > >> possibility to launch a COPY FROM on a subset of nodes, particularly useful > >> when redistribution consists of a replicated table whose set of nodes is > >> increased. > >> Compared to the last versions, the impact of redistribution in tablecmds.c > >> is limited. > >> > >> Regards, > >> > >> -- > >> Michael Paquier > >> http://michael.otacoo.com > > > > > > > > > > -- > > Michael Paquier > > http://michael.otacoo.com > > > > ------------------------------------------------------------------------------ > > Live Security Virtual Conference > > Exclusive live event will cover all the ways today's security and > > threat landscape has changed and how IT managers can respond. Discussions > > will include endpoint security, mobile security and the latest in malware > > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > _______________________________________________ > > Postgres-xc-developers mailing list > > Pos...@li... > > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > > > > > -- > StormDB - http://www.stormdb.com > The Database Cloud > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > |
From: Ashutosh B. <ash...@en...> - 2012-07-13 06:10:02
|
Even, I am wondering if that would be better. But, one thing that is essential is catalog updates. Are you suggesting that the catalog updates too should be done using some SQL? Can you please expand more on your idea, may be providing some examples, pseudo-code etc.? On Fri, Jul 13, 2012 at 10:36 AM, Nikhil Sontakke <ni...@st...>wrote: > Just a thought. > > If we have a utility which spews out all of these statements to > redistribute a table across node modifications, then we can just wrap > them inside a transaction block and just run that? > > Wont it save all of the core changes? > > Regards, > Nikhils > > On Fri, Jul 13, 2012 at 12:29 AM, Michael Paquier > <mic...@gm...> wrote: > > Hi all, > > > > Please find attached an updated patch adding redistribution optimizations > > for replicated tables. > > If the node subset of a replicated table is reduced, the necessary nodes > are > > simply truncated. > > If it is increased, a COPY TO is done to fetch the data, and COPY FROM is > > done only on the necessary nodes. > > New regression tests have been added to test that. > > > > Regards, > > > > > > On Thu, Jul 12, 2012 at 5:30 PM, Michael Paquier < > mic...@gm...> > > wrote: > >> > >> OK, here is the mammoth patch: 3000 lines including docs, implementation > >> and regressions. > >> The code has been realigned with current master. > >> This patch introduces the latest thing I am working on: the > redistribution > >> command tree planning and execution. > >> > >> As I explained before, a redistribution consists of a series of commands > >> (TRUNCATE, REINDEX, DELETE, COPY FROM, COPY TO) that need to be > determined > >> depending on the new and old locator information of the relation. Each > >> action can be done on a subset of nodes. > >> This patch introduces the basic infrastructure of the command tree build > >> and execution. > >> For the time being, redistribution uses only what is called the default > >> command tree consisting of: > >> 1) COPY TO > >> 2) TRUNCATE > >> 3) COPY FROM > >> 4) REINDEX > >> But this structure can be easily completed with more complicated > >> operations. > >> In this patch there is still a small thing missing which is the > >> possibility to launch a COPY FROM on a subset of nodes, particularly > useful > >> when redistribution consists of a replicated table whose set of nodes is > >> increased. > >> Compared to the last versions, the impact of redistribution in > tablecmds.c > >> is limited. > >> > >> Regards, > >> > >> -- > >> Michael Paquier > >> http://michael.otacoo.com > > > > > > > > > > -- > > Michael Paquier > > http://michael.otacoo.com > > > > > ------------------------------------------------------------------------------ > > Live Security Virtual Conference > > Exclusive live event will cover all the ways today's security and > > threat landscape has changed and how IT managers can respond. Discussions > > will include endpoint security, mobile security and the latest in malware > > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > _______________________________________________ > > Postgres-xc-developers mailing list > > Pos...@li... > > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > > > > > -- > StormDB - http://www.stormdb.com > The Database Cloud > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Michael P. <mic...@gm...> - 2012-07-13 05:12:28
|
On Fri, Jul 13, 2012 at 2:06 PM, Nikhil Sontakke <ni...@st...>wrote: > Just a thought. > > If we have a utility which spews out all of these statements to > redistribute a table across node modifications, then we can just wrap > them inside a transaction block and just run that? > Wont it save all of the core changes? > On the contrary it is a core feature. You need to provide users a direct way to rebalance data with core of XC. Btw, I made my stuff such as the core is minimally impacted and only in gram.y and tablecmds.c. -- Michael Paquier http://michael.otacoo.com |