You can subscribe to this list here.
2010 |
Jan
|
Feb
|
Mar
|
Apr
(10) |
May
(17) |
Jun
(3) |
Jul
|
Aug
|
Sep
(8) |
Oct
(18) |
Nov
(51) |
Dec
(74) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2011 |
Jan
(47) |
Feb
(44) |
Mar
(44) |
Apr
(102) |
May
(35) |
Jun
(25) |
Jul
(56) |
Aug
(69) |
Sep
(32) |
Oct
(37) |
Nov
(31) |
Dec
(16) |
2012 |
Jan
(34) |
Feb
(127) |
Mar
(218) |
Apr
(252) |
May
(80) |
Jun
(137) |
Jul
(205) |
Aug
(159) |
Sep
(35) |
Oct
(50) |
Nov
(82) |
Dec
(52) |
2013 |
Jan
(107) |
Feb
(159) |
Mar
(118) |
Apr
(163) |
May
(151) |
Jun
(89) |
Jul
(106) |
Aug
(177) |
Sep
(49) |
Oct
(63) |
Nov
(46) |
Dec
(7) |
2014 |
Jan
(65) |
Feb
(128) |
Mar
(40) |
Apr
(11) |
May
(4) |
Jun
(8) |
Jul
(16) |
Aug
(11) |
Sep
(4) |
Oct
(1) |
Nov
(5) |
Dec
(16) |
2015 |
Jan
(5) |
Feb
|
Mar
(2) |
Apr
(5) |
May
(4) |
Jun
(12) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(4) |
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
S | M | T | W | T | F | S |
---|---|---|---|---|---|---|
1
|
2
(1) |
3
(6) |
4
(19) |
5
|
6
(15) |
7
(2) |
8
(2) |
9
(22) |
10
(20) |
11
(20) |
12
(14) |
13
(12) |
14
(2) |
15
|
16
(14) |
17
(17) |
18
(4) |
19
(8) |
20
(2) |
21
(3) |
22
|
23
(8) |
24
(1) |
25
|
26
(2) |
27
(1) |
28
|
29
|
30
(7) |
31
(3) |
|
|
|
|
From: Nikhil S. <ni...@st...> - 2012-07-04 02:50:03
|
> I also believe it's not a good idea to monitor a datanode through a > coordinator using EXECUTE DIRECT because the latter may be failed > while the whole cluster is in operation. > Well, if there are multiple failures we ought to know about them anyways. So if this particular coordinator fails the monitor tells us about it first. We fix it and then move on to the datanode failure detection. Since the datanodes have to be reachable via coordinators and we have multiple coordinators around to load balance anyways, I still think EXECUTE DIRECT via the coordinator node is a decent idea. If we can round robin the calls via all the coordinators that would be better too I think. Regards, Nikhils > Regards; > ---------- > Koichi Suzuki > > > 2012/7/4 Koichi Suzuki <koi...@gm...>: >> The background of xc_watchdog is to provide quicker means to detect >> node fault. I understand that it is not compatible with what we're >> doing in conventional PG applications, which are mostly based upon >> psql -c 'select 1'. It takes at most 60sec to detect the error (TCP >> timeout value). Some applications will be satisfied with this and >> some may not. This is raised at the clustering summit in Ottawa and >> the suggestion was to have this kind of means (watchdog). >> >> I don't know if PG people are interested in this now. Maybe we >> should wait until such fault detection is more realistic issue. >> Implementation is very straightforward. >> >> For datanode, I don't like to ask applications to connect to it >> directly using psql because it is a kind of tricky use and it may mean >> that we allow applications to connect to datanodes directly. So I >> think we should encapsulate this with dedicated command like >> xc_monitor. Xc_ping sounds good too but "ping" reminds me >> consecutive monitoring. Current practice needs only one monitoring. >> So I'd like xc_monitor (or node_monitor). >> >> Command like 'xc_monitor -Z nodetype -h host -p port' will not need >> any modification to the core. Will be submitted soon as contrib >> module. >> >> Regards; >> ---------- >> Koichi Suzuki >> >> >> 2012/7/4 Nikhil Sontakke <ni...@st...>: >>>> Are there people with a similar opinion to mine??? >>>> >>> >>> +1 >>> >>> IMO too we should not be making any too invasive internal changes to >>> support monitoring. What would be better would be to maybe allow >>> commands which can be scripted and which can work against each of the >>> components. >>> >>> For example, for the coordinator/datanode periodic "SELECT 1" commands >>> should be good enough. Even doing an EXECUTE DIRECT via a coordinator >>> to the datanodes will help. >>> >>> For GTM/GTM_Standy/GTM_Proxy components we should introduce "gtm_ctl >>> ping" kinds of commands which will basically connect to them and see >>> that they are responding ok. >>> >>> Such interfaces make it really easy for monitoring solutions like >>> nagios, zabbix etc. to monitor them. These tools have been used for a >>> while now to monitor Postgres and it should be a natural logical >>> evolution for users to see them being used for PG XC. >>> >>> Regards, >>> Nikhils >>> -- >>> StormDB - http://www.stormdb.com >>> The Database Cloud -- StormDB - http://www.stormdb.com The Database Cloud |
From: Koichi S. <koi...@gm...> - 2012-07-04 01:03:31
|
I also found that it is quite difficult to have both gtm_client.h and libpq-fe.h in the same binary. Gtm_client.h includes gtm/libpq-fe.h, which has so many naming conflicts. I remembered this was a major issue when I tried first pgxc_clean implementation and finally I made the core modification to make pgxc_clean complete XC application. This time, I don't think we should depend on some coordinator because it could fail while the whole cluster is in operation. I need direct access to gtm. Because it is not simple and practical to resolve conflicts among gtm/libpq-fe.h and libpq-fe.h, I'd like to write separate monitor command, gtm_monitor and node_monitor. To discourage direct connection from Apps to datanodes, still I'd like to provide dedicated monitoring command for datanodes (and coordinators). I also believe it's not a good idea to monitor a datanode through a coordinator using EXECUTE DIRECT because the latter may be failed while the whole cluster is in operation. Regards; ---------- Koichi Suzuki 2012/7/4 Koichi Suzuki <koi...@gm...>: > The background of xc_watchdog is to provide quicker means to detect > node fault. I understand that it is not compatible with what we're > doing in conventional PG applications, which are mostly based upon > psql -c 'select 1'. It takes at most 60sec to detect the error (TCP > timeout value). Some applications will be satisfied with this and > some may not. This is raised at the clustering summit in Ottawa and > the suggestion was to have this kind of means (watchdog). > > I don't know if PG people are interested in this now. Maybe we > should wait until such fault detection is more realistic issue. > Implementation is very straightforward. > > For datanode, I don't like to ask applications to connect to it > directly using psql because it is a kind of tricky use and it may mean > that we allow applications to connect to datanodes directly. So I > think we should encapsulate this with dedicated command like > xc_monitor. Xc_ping sounds good too but "ping" reminds me > consecutive monitoring. Current practice needs only one monitoring. > So I'd like xc_monitor (or node_monitor). > > Command like 'xc_monitor -Z nodetype -h host -p port' will not need > any modification to the core. Will be submitted soon as contrib > module. > > Regards; > ---------- > Koichi Suzuki > > > 2012/7/4 Nikhil Sontakke <ni...@st...>: >>> Are there people with a similar opinion to mine??? >>> >> >> +1 >> >> IMO too we should not be making any too invasive internal changes to >> support monitoring. What would be better would be to maybe allow >> commands which can be scripted and which can work against each of the >> components. >> >> For example, for the coordinator/datanode periodic "SELECT 1" commands >> should be good enough. Even doing an EXECUTE DIRECT via a coordinator >> to the datanodes will help. >> >> For GTM/GTM_Standy/GTM_Proxy components we should introduce "gtm_ctl >> ping" kinds of commands which will basically connect to them and see >> that they are responding ok. >> >> Such interfaces make it really easy for monitoring solutions like >> nagios, zabbix etc. to monitor them. These tools have been used for a >> while now to monitor Postgres and it should be a natural logical >> evolution for users to see them being used for PG XC. >> >> Regards, >> Nikhils >> -- >> StormDB - http://www.stormdb.com >> The Database Cloud |
From: Michael P. <mic...@gm...> - 2012-07-04 00:56:16
|
On Wed, Jul 4, 2012 at 9:31 AM, Koichi Suzuki <koi...@gm...>wrote: > The background of xc_watchdog is to provide quicker means to detect > node fault. I understand that it is not compatible with what we're > doing in conventional PG applications, which are mostly based upon > psql -c 'select 1'. It takes at most 60sec to detect the error (TCP > timeout value). Some applications will be satisfied with this and > some may not. This is raised at the clustering summit in Ottawa and > the suggestion was to have this kind of means (watchdog). > It is also possible to set up keep alive options in the connection string used to connect to the database server. The "SELECT 1" option is just an option. You could also use the hooks implemented in vanilla postgres to write back in logs some monitoring activities. Those hooks could be created with an extension module pluggable directly to XC and hence avoid core having dependencies with monitoring. > I don't know if PG people are interested in this now. Maybe we > should wait until such fault detection is more realistic issue. > Implementation is very straightforward. > We should definitely discuss with them about that. it may be that they have some reasons not to do that yet, in that case I don't know which ones. However, what is sure is that core code should not depend on monitoring. > > For datanode, I don't like to ask applications to connect to it > directly using psql because it is a kind of tricky use and it may mean > that we allow applications to connect to datanodes directly. So I > think we should encapsulate this with dedicated command like > xc_monitor. Xc_ping sounds good too but "ping" reminds me > consecutive monitoring. Current practice needs only one monitoring. > So I'd like xc_monitor (or node_monitor). > > Command like 'xc_monitor -Z nodetype -h host -p port' will not need > any modification to the core. Will be submitted soon as contrib > module. > This is OK I think. The main point here is how to control if the database is alive or not, and there are already some solutions ready to be used. We just cannot enforce a solution inside our core code that may impact all the users, including people who would like to monitor their databases with solutions like nagios. -- Michael Paquier http://michael.otacoo.com |
From: Michael P. <mic...@gm...> - 2012-07-04 00:50:48
|
On Wed, Jul 4, 2012 at 9:34 AM, Koichi Suzuki <koi...@gm...>wrote: > This is very important and big issue. Before reviewing the patch, I > think you should upload the idea to somewhere else and point to it. > I don't think it is necessary. There are already a lot of discussions done about it, design documents and documentation in the patch. You can refer to the previous emails about that, I give there a full explanation of the methods used and the means to achieve it. Amit also provided some feedback. - mail 1: about the feature https://sourceforge.net/mailarchive/forum.php?thread_name=CAB7nPqSTiErd6ne_DECZXdo3F4yCZiq-0%2BOBkggFJm9SfT97-w%40mail.gmail.com&forum_name=postgres-xc-developers - mail 2: about extracting COPY APIs: https://sourceforge.net/mailarchive/forum.php?thread_name=CACoZds27MPrkcSeo0EX3jE94hZ%3Dhn-oxK38BW99c7zP-%2B%2BBV6w%40mail.gmail.com&forum_name=postgres-xc-developers I also wrote up a design document explaining all the goals of this feature, which is based on the spec designed by Ashutosh. If this is not enough, please mention what is missing and I will point to it. I don't think I fully understand the idea so far. > Reading documentation in the patch, as well as the email thread explaining what is the method used and the means to achieve it would help I believe. Patch, design docs, and discussion threads are here, so it is just necessary to follow the events. Thanks, -- Michael Paquier http://michael.otacoo.com |
From: Koichi S. <koi...@gm...> - 2012-07-04 00:34:55
|
This is very important and big issue. Before reviewing the patch, I think you should upload the idea to somewhere else and point to it. I don't think I fully understand the idea so far. Regards; ---------- Koichi Suzuki 2012/7/3 Michael Paquier <mic...@gm...>: > Hi all, > > Please find attached 2 patches: 20120703_remotecopy.patch and > 20120703_altertable_distrib.patch. > 20120703_remotecopy.patch is a lightly modified version of a patch that has > already reviewed by Amit where the COPY protocol used by XC code in copy.c > is extracted into an external file. > This cleans copy.c with a lot of code and simplifies the comprehension of > the protocol used. The only part modified is in RemoteCopy_GetRelationLoc > where we scan all the attributes of a relation to find the distribution > column of a table in case the list of attribute numbers is not specified. > This patch has already been reviewed and can be already committed I think. > > Now the real part, online data redistribution is managed by the second > patch: 20120703_altertable_distrib.patch. > I am not coming back to the design of the feature that has been chosen. > The main modification that is introduced by this patch is the use of a tuple > store to store the tuples that need to be redistributed in the cluster. This > patch also contains new features that allow to materialize in a tuple slot > raw data received by COPY protocol on Coordinator to be able to redirect to > correct node a tuple if the new distribution type is hash or modulo. This > new mechanism can also be used not only for data redistribution but also to > facilitate the exchange of data between nodes (direct consequences on > triggers and global constraints). > The reverse transformation (from tuple slot to raw data) is also included in > this patch. > > This patch is something like 2000 lines, and does not yet contain the > following features which will be added by other patches once this is > committed: > - No need to materialize in tuple slot if new distribution is replication > - No optimization when a replicated table subcluster is reduced (need only > to send TRUNCATE to correct nodes) > - No optimization when a replicated table subcluster is increased (need only > to send tuple to new nodes after fetching it from old nodes) > However, I wrote the patch in a way such as those optimizations are easy to > implement in the current infrastructure. > > Please note that this patch contains all the documentation and regression > tests. > So, any guy has the courage to provide comments to it? > -- > Michael Paquier > http://michael.otacoo.com > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > |
From: Koichi S. <koi...@gm...> - 2012-07-04 00:31:59
|
The background of xc_watchdog is to provide quicker means to detect node fault. I understand that it is not compatible with what we're doing in conventional PG applications, which are mostly based upon psql -c 'select 1'. It takes at most 60sec to detect the error (TCP timeout value). Some applications will be satisfied with this and some may not. This is raised at the clustering summit in Ottawa and the suggestion was to have this kind of means (watchdog). I don't know if PG people are interested in this now. Maybe we should wait until such fault detection is more realistic issue. Implementation is very straightforward. For datanode, I don't like to ask applications to connect to it directly using psql because it is a kind of tricky use and it may mean that we allow applications to connect to datanodes directly. So I think we should encapsulate this with dedicated command like xc_monitor. Xc_ping sounds good too but "ping" reminds me consecutive monitoring. Current practice needs only one monitoring. So I'd like xc_monitor (or node_monitor). Command like 'xc_monitor -Z nodetype -h host -p port' will not need any modification to the core. Will be submitted soon as contrib module. Regards; ---------- Koichi Suzuki 2012/7/4 Nikhil Sontakke <ni...@st...>: >> Are there people with a similar opinion to mine??? >> > > +1 > > IMO too we should not be making any too invasive internal changes to > support monitoring. What would be better would be to maybe allow > commands which can be scripted and which can work against each of the > components. > > For example, for the coordinator/datanode periodic "SELECT 1" commands > should be good enough. Even doing an EXECUTE DIRECT via a coordinator > to the datanodes will help. > > For GTM/GTM_Standy/GTM_Proxy components we should introduce "gtm_ctl > ping" kinds of commands which will basically connect to them and see > that they are responding ok. > > Such interfaces make it really easy for monitoring solutions like > nagios, zabbix etc. to monitor them. These tools have been used for a > while now to monitor Postgres and it should be a natural logical > evolution for users to see them being used for PG XC. > > Regards, > Nikhils > -- > StormDB - http://www.stormdb.com > The Database Cloud |
From: Michael P. <mic...@gm...> - 2012-07-03 23:38:15
|
On Wed, Jul 4, 2012 at 8:02 AM, Nikhil Sontakke <ni...@st...> wrote: > > > > Are there people with a similar opinion to mine??? > > > > +1 > > IMO too we should not be making any too invasive internal changes to > support monitoring. What would be better would be to maybe allow > commands which can be scripted and which can work against each of the > components. > This could be more easily manageable by creating new system functions for monitoring in C to do that as an EXTENSION, pluggable as a contrib module > For example, for the coordinator/datanode periodic "SELECT 1" commands > should be good enough. Even doing an EXECUTE DIRECT via a coordinator > to the datanodes will help. > > For GTM/GTM_Standy/GTM_Proxy components we should introduce "gtm_ctl > ping" kinds of commands which will basically connect to them and see > that they are responding ok. > That is an interesting idea. Btw, we should definitely avoid any additional GUC parameters inside GTM core code. This avoids complicating cluster settings, and users may use a different monitoring solution as the one proposed. > > Such interfaces make it really easy for monitoring solutions like > nagios, zabbix etc. to monitor them. These tools have been used for a > while now to monitor Postgres and it should be a natural logical > evolution for users to see them being used for PG XC. > Completely agreed, we do not need to reinvent solutions already existing and proved to be enough sufficient. -- Michael Paquier http://michael.otacoo.com |
From: Nikhil S. <ni...@st...> - 2012-07-03 23:02:43
|
> > Are there people with a similar opinion to mine??? > +1 IMO too we should not be making any too invasive internal changes to support monitoring. What would be better would be to maybe allow commands which can be scripted and which can work against each of the components. For example, for the coordinator/datanode periodic "SELECT 1" commands should be good enough. Even doing an EXECUTE DIRECT via a coordinator to the datanodes will help. For GTM/GTM_Standy/GTM_Proxy components we should introduce "gtm_ctl ping" kinds of commands which will basically connect to them and see that they are responding ok. Such interfaces make it really easy for monitoring solutions like nagios, zabbix etc. to monitor them. These tools have been used for a while now to monitor Postgres and it should be a natural logical evolution for users to see them being used for PG XC. Regards, Nikhils -- StormDB - http://www.stormdb.com The Database Cloud |
From: Michael P. <mic...@gm...> - 2012-07-03 05:04:45
|
Hum, I am honestly not a fan of this way of doing. I just cannot get the meaning of touching the core code for a feature which is doing only monitoring. We should discuss with Postgres community about that and receive feedback about the possible solutions we could use here, and really think a lot before touching code parts that we haven't touched yet and might impact PostgreSQL code itself if there are any side effects. What I cannot get is why adding an internal chronometer when there are already options available: - Using a simple "SELECT 1" on the database - Using pg_ctl status ! This implementation makes the core code dependent on monitoring features when it should definitely be the opposite. A database server monitoring shouldn't touch the core, but only use its functionalities. And even if PostgreSQL would need such a feature, XC should extend in a cluster way what is already existing in Postgres. So why reinventing the wheel?? Also, this patch adds a total of 6 GUC parameters, 2 for GTM, 2 for GTM-proxy and 2 for Coordinator/Datanode. It complicated too much the feature. Instead of creating so many dependencies with Postgres code, why not creating a simple system function that returns back to client a confirmation message at a given time interval? Let's imagine the system function pgxc_watchdog(interval, cycle); This function could be like this. Datum pgxc_watchdog(interval time, int cycles) { int i; for (i = 0; i < cycles; i++) { sleep(interval) /* Send back to client */ send_back('SELECT 1 result'); /* Send back a result or something */ } } There are a lot of benefits on doing that: - do not touch the core for monitoring purposes (really really important to my mind) - reduce GUC parameter by 6. - This implementation is portable and easy to maintain, you can also create a similar function to check GTM status from an XC node. - You do not need an additional external module that would need to read the monitoring pulse => you connect with a given client to a PostgreSQL server, and launch it through a driver, whatever it is, so it can be really easily adapted to all kind of implementations and applications. Are there people with a similar opinion to mine??? On Mon, Jul 2, 2012 at 2:25 PM, Koichi Suzuki <koi...@gm...>wrote: > Hi, > > Eclosed is a WIP patch for xc_watchdog, for coordinator/datanode/gtm > and gtm_proxy. It is against the current master as of June 2nd, > 2:00PM in JST. I've tested them with gdb and found watchdog timer is > incremented as expected. I will write time detector and continue > more test. > > Regards; > ---------- > Koichi Suzuki > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > -- Michael Paquier http://michael.otacoo.com |
From: Michael P. <mic...@gm...> - 2012-07-03 04:46:40
|
On Fri, Jun 29, 2012 at 8:34 PM, Amit Khandekar < ami...@en...> wrote: > For utility statements in general, the coordinator propagates SQL > statements to all the required nodes, and most of these statements get run > on the datanodes inside a transaction block. So, when the statement fails > on at least one of the nodes, the statement gets rollbacked on all the > nodes due to the two-phase commit taking place, and therefore the cluster > rollbacks to a consistent state. But there are some statements which > cannot be run inside a transaction block. Here are some important ones: > CREATE/DROP DATABASE > CREATE/DROP TABLESPACE > ALTER DATABASE SET TABLESPACE > ALTER TYPE ADD ... (for enum types) > CREATE INDEX CONCURRENTLY > REINDEX DATABASE > DISCARD ALL > > So such statements run on datanodes in auto-commit mode, and so create > problems if they succeed on some nodes and abort on other nodes. For e.g. > : CREATE DATABASE. If a datanode d1 returns with error, and any other > datanode d2 has already returned back to coordinator with success, the > coordinator can't undo the commit of d2 because this is already committed. > Or if the coordinator itself crashes after datanodes commit but before the > coordinator commits, then again we have the same problem. The database > cannot be recreated from coordinator, since it is already created on some > of the other nodes. In such a cluster state, administrator needs to connect > to datanodes and do the needed cleanup. > > The committed statements can be followed by statements that undo the > operation, for e.g. DROP DATABASE for a CREATE DATABASE. But here again > this statement can fail for some reason. Also, typically for such > statements, their UNDO counterparts themselves cannot be run inside a > transaction block as well. So this is not a guaranteed way to bring back > the cluster to a consistent state. > > To find out how we can get around this issue, let's see why these > statements require to be run outside a transaction block in the first > place. There are two reasons why: > > 1. Typically such statements modify OS files and directories which cannot > be rollbacked. > > For DMLs, the rollback does not have to be explicitly undone. MVCC takes > care of it. But for OS file operations, there is no automatic way. So such > operations cannot be rollbacked. So in a transaction block, if a > create-database is followed by 10 other SQL statements before commit, and > one of the statements throws an error, ultimately the database won't be > created but there will be database files taking up disk space, and this has > happened just because the user has written the script wrongly. > > So by restricting such statement to be run outside a transaction block, an > unrelated error won't cause garbage files to be created. > > The statement itself does get committed eventually as usual. And it can > also get rolled back in the end. But maximum care has been taken in the > statement function (for e.g. createdb) such that the chances of an error > occurring *after* the files are created is least. For this, such a code > segment is inside PG_ENSURE_ERROR_CLEANUP() with some error_callback > function (createdb_failure_callback) which tries to clean up the files > created. > > So the end result is that this window between files-created and > error-occurred is minimized, not that such statements will never create > such cleanup issues if run outside transaction block. > > Possible solution: > > So regarding Postgres-XC, if we let such statements to be run inside > transaction block but only on remote nodes, what are the consequences? This > will of course prevent the issue of the statement committed on one node and > not the other. Also, the end user will still be prevented from running the > statement inside the transaction. Moreover, for such statement, say > create-database, the database will be created on all nodes or none, even if > one of the nodes return error. The only issue is, if the create-database is > aborted, it will leave disk space wasted on nodes where it has succeeded. > But this will be caused because of some configuration issues like disk > space, network down etc. The issue of other unrelated operations in the > same transaction causing rollback of create-database will not occur anyways > because we still don't allow it in a transaction block for the end-user. > > So the end result is we have solved the inconsistent cluster issue, > leaving some chances of disk cleanup issue, although not due to > user-queries getting aborted. So may be when such statements error out, we > display a notice that files need to be cleaned up. > Could it be possible to store somewhere in the PGDATA folder of the node involved the files that need to be cleaned up? We could use for this purpose some binary encoding or something. Ultimately this would finish just by being a list of files inside PGDATA to be cleaned up. We could then create a system function that unlinks all the files whose name have been stored on local node. As such a system function does not interact with other databases it could be immutable in order to allow a clean up from coordinator with EXECUTE DIRECT. > We can go further ahead to reduce this window. We split the > create-database operation. We begin a transaction block, and then let > datanodes create the non-file operations first, like inserting pg_database > row, etc, by running them using a new function call. Don't commit it yet. > Then fire the last part: file system operations, this too using another > function call. And then finally commit. This file operation will be under > PG_ENSURE_ERROR_CLEANUP(). Due to synchronizing these individual tasks, we > reduce the window further. > We need to be careful here with the impact of our code on PostgreSQL code. It would be a pain to have a complecated implementation here for future merges. > 2. Some statements do internal commits. > > For e.g. movedb() calls TransactionCommit() after copying the files, and > then removes the original files, so that if it crashes while removing the > files, the database with the new tablespace is already committed and > intact, so we just leave some old files. > > Such statements doing internal commits cannot be rolled back if run inside > transaction block, because they already do some commits. For such > statements, the above solution does not work. We need to find a separate > way for these specific statements. Few of such statements include: > ALTER DATABASE SET TABLESPACE > CLUSTER > CREATE INDEX CONCURRENTLY > > One similar solution is to split the individual tasks that get internally > committed using different functions for each task, and run the individual > functions on all the nodes synchronously. So the 2nd task does not start > until the first one gets committed on all the nodes. Whether it is feasible > to split the task is a question, and it depends on the particular command. > We would need a locking system for each task and each task step like what is done for barrier. Or a new communication protocol, once again like barriers. Those are once again just ideas on the top of my mind. > > As of now, I am not sure whether we can do some common changes in the way > transactions are implemented to find a common solution which does not > require changes for individual commands. But I will investigate more. > Thanks. -- Michael Paquier http://michael.otacoo.com |
From: Koichi S. <koi...@gm...> - 2012-07-02 05:25:30
|
Hi, Eclosed is a WIP patch for xc_watchdog, for coordinator/datanode/gtm and gtm_proxy. It is against the current master as of June 2nd, 2:00PM in JST. I've tested them with gdb and found watchdog timer is incremented as expected. I will write time detector and continue more test. Regards; ---------- Koichi Suzuki |