You can subscribe to this list here.
2010 |
Jan
|
Feb
|
Mar
|
Apr
(10) |
May
(17) |
Jun
(3) |
Jul
|
Aug
|
Sep
(8) |
Oct
(18) |
Nov
(51) |
Dec
(74) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2011 |
Jan
(47) |
Feb
(44) |
Mar
(44) |
Apr
(102) |
May
(35) |
Jun
(25) |
Jul
(56) |
Aug
(69) |
Sep
(32) |
Oct
(37) |
Nov
(31) |
Dec
(16) |
2012 |
Jan
(34) |
Feb
(127) |
Mar
(218) |
Apr
(252) |
May
(80) |
Jun
(137) |
Jul
(205) |
Aug
(159) |
Sep
(35) |
Oct
(50) |
Nov
(82) |
Dec
(52) |
2013 |
Jan
(107) |
Feb
(159) |
Mar
(118) |
Apr
(163) |
May
(151) |
Jun
(89) |
Jul
(106) |
Aug
(177) |
Sep
(49) |
Oct
(63) |
Nov
(46) |
Dec
(7) |
2014 |
Jan
(65) |
Feb
(128) |
Mar
(40) |
Apr
(11) |
May
(4) |
Jun
(8) |
Jul
(16) |
Aug
(11) |
Sep
(4) |
Oct
(1) |
Nov
(5) |
Dec
(16) |
2015 |
Jan
(5) |
Feb
|
Mar
(2) |
Apr
(5) |
May
(4) |
Jun
(12) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(4) |
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
S | M | T | W | T | F | S |
---|---|---|---|---|---|---|
|
|
|
|
1
(19) |
2
(15) |
3
|
4
(1) |
5
(17) |
6
(26) |
7
(18) |
8
(25) |
9
(7) |
10
(2) |
11
|
12
(6) |
13
(1) |
14
(5) |
15
(1) |
16
|
17
|
18
|
19
|
20
(3) |
21
(1) |
22
(14) |
23
(10) |
24
|
25
|
26
(11) |
27
(19) |
28
(1) |
29
(9) |
30
(7) |
31
|
From: Amit K. <ami...@en...> - 2012-03-02 10:44:54
|
On 2 March 2012 13:36, Michael Paquier <mic...@gm...> wrote: > Hi, > > I wrote up a document summarizing the trigger situation. > It summarizes current situation, the current situation, and ways to solve > the current situation. Thanks Michael for preparing the writeup. In my opinion, if correcting GetTupleForTrigger() would make the update/delete triggers work correctly, that would real be great. That way, at least we can say that all of update/delete/insert triggers are supported although it would degrade performance because of trigger functions running on coordinator. And user applications would work even if any table has triggers. This is at least better than blocking out such applications. And we are also not loosing correctness here. If the above method is successful, we can think of various ways to increase the performance by shipping triggers to datanodes only in certain cases, but we would have to do this for > 1.0. We might even be able to read the SQLs inside a trigger function (say, only plgsql) and somehow determine from all the tables used that this operation can be done in one particular datanode, and there are no volatile functions called, and so ship only if this is true. Something similar to what Ashutosh mentioned in an earlier mail thread. I personally feel a bit uncomfortable with the option for running-triggers-on-datanodes-on-user-responsibility. Still not sure how we would guide the user about the scenarios where he could safely mark the functions immutable so that the function gets shipped. > Feel free to comment. > > > On Thu, Mar 1, 2012 at 11:26 PM, Koichi Suzuki <koi...@gm...> wrote: >> >> Yes, please! It will be helpful if you write down what could be an >> ideal solution for this and how far we are at now. >> ---------- >> Koichi Suzuki >> >> >> >> 2012/3/1 Michael Paquier <mic...@gm...>: >> > >> > >> > On 2012/03/01, at 19:33, Amit Khandekar >> > <ami...@en...> >> > wrote: >> > >> > >> > >> > On 1 March 2012 15:29, Koichi Suzuki <koi...@gm...> wrote: >> >> >> >> Well, originally we designed Triggers to be fired at datanodes. It >> >> seemed simpler. However, we found that Trigger definition does not >> >> fit to this idea. Maybe we need to do the following as the current >> >> stage before we step further: >> >> >> >> 1. Write simple document to evaluate the original design, including >> >> it's assumption and what is not practical. >> > >> > >> > Yes +1 for coming up with the exact document notes. I personally find it >> > difficult to come up with the precise constraints for trigger behaviour >> > on >> > datanodes for immutable functions. There seem to be lots of constraints >> > and >> > ifs and buts. But we can decide after see how simple the functionality >> > notes >> > look. >> > >> > OK. I'll type up smth about that to make it clear. >> > >> > >> >> 2. If the original design is not good (and I'm afraid so), suggest a >> >> right design direction, >> >> >> >> Anyway, I think current INSERT trigger will survive these debates. >> >> >> >> Regards; >> >> ---------- >> >> Koichi Suzuki >> >> >> >> >> >> >> >> 2012/3/1 Michael Paquier <mic...@gm...>: >> >> > So how to do? >> >> > >> >> > For the time being I haven't written anything in the docs regarding >> >> > this >> >> > idea. What is simply written is: "XC does not support triggers with >> >> > stable/volatile functions for update, delete and truncate". So for a >> >> > first >> >> > step of implementation I think it is enough. >> >> > >> >> > The second step of implementation, which would be to add trigger >> >> > support >> >> > for >> >> > stable/volatile functions for update/delete/truncate would be heavier >> >> > than I >> >> > thought. >> >> > My question is: do you think it is worth the effort for 1.0? >> >> > >> >> > If the answer is yes, I'll try to work on this support but at first >> >> > sight >> >> > that it may be pretty heavy. And it would be perhaps better to spend >> >> > time on >> >> > the stability issues rather than heavy implementations. >> >> > Regards, >> >> > >> >> > -- >> >> > Michael Paquier >> >> > http://michael.otacoo.com >> > >> > > > > > > -- > Michael Paquier > http://michael.otacoo.com > > ------------------------------------------------------------------------------ > Virtualization & Cloud Management Using Capacity Planning > Cloud computing makes use of virtualization - but cloud computing > also focuses on allowing computing to be delivered as a service. > http://www.accelacomm.com/jaw/sfnl/114/51521223/ > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > |
From: Michael P. <mic...@gm...> - 2012-03-02 08:06:58
|
Trigger in XC: current implementation, limitation and ideas NTT Data Intellilink 2012.03.02 This document presents the current implementation done for trigger, its limitations, and the different scenarios that could lead to trigger implementation. 1. Implementation In a cluster-wide environment, trigger firing decision depends on two parameters: - Shippability of the procedure fired by trigger - Shippability of the query invocating trigger So there are 4 possible cases of trigger firing: - if procedure is shippable and query invocating trigger is shippable, push down everything and fire trigger on Datanode. - if procedure is not shippable and query invocating trigger is shippable, fall back to standard planner and fire trigger on Coordinator. - if procedure is shippable and query invocating trigger is not shippable, fall back to standard planner and fire trigger on Coordinator. - if procedure is not shippable and query invocating trigger is not shippable, fall back to standard planner and fire trigger on Coordinator. If the query is not shippable, query has to go through standard planner to use extended query protocol (normal behavior). Ex: select * from table where col = currval('seq'); If the trigger procedure is not shippable, volatile or stable function, this function needs by definition the Coordinator to fetch a correct value. As for normal procedures, you cannot fetch a consistent value for a cluster if non-shippable procedures are not fired on Coordinator. So, the current implementation plan is to add in XC planner a check on trigger shippability. If for a given query trigger is not shippable XC planner falls back to standard planner where trigger is fired on Coordinator. This would work correctly for INSERT, UPDATE and DELETE triggers. For TRUNCATE triggers, in case the procedure fired is immutable, it is enough to ship the trigger. If the procedure is not shippable, there are 2 cases to consider: - FOR EACH STATEMENT BEFORE, fire the trigger on Coordinator before launch of TRUNCATE on remote nodes. - FOR EACH STATEMENT AFTER, fire the trigger on Coordinator after launch of TRUNCATE on remote nodes. This support needs minimal effort as triggers on truncate are only statement-based. 2. Current limitations: Here is the list of what is currently supported with latest patch: - INSERT triggers are fully implemented, for all types of procedures - UPDATE, DELETE and TRUNCATE triggers are supported only if procedure is shippable. Support for INSERT, trigger is possible because the tuple slot that can be modified by trigger is immediately available when executing a remote query, all the new values of INSERT are here. For UPDATE and DELETE, triggers show the following type of error: update tttest set price_val = 30 where price_id = nextval('seq'); -- non-shippable query firing a non shippable trigger ERROR: could not read block 0 in file "base/12054/t3_16388": read only 0 of 8192 bytes This is due to a function called GetTupleForTrigger in trigger.c where we try to get a tuple from a trigger on local node (here Coordinator), but this should be replaced by a mechanism in such a way that a tuple from remote node is recovered instead of a tuple on local node. 3. Scenarios to support triggers: The ideal scenario would be to modify GetTupleForTrigger to be able to get a tuple from remote node to fire trigger correctly on Coordinator. This will make triggers completely available in XC without any modifications done on applications. Triggers will remain transparent from application. I will try to focus on that for the next couple of days in order to support triggers fully in XC. However, following Suzuki-san suggestion, the former design of trigger was just to fire triggers on Datanodes. This would make the implementation far more simple, but if the procedure executed by trigger is non-shippable, this could very easily break the cluster consistency, even if the initial plan was to let the DBA have all the responsability of trigger definitions. In conclusion the dilemma is: Which way to choose? [End] |
From: Koichi S. <koi...@gm...> - 2012-03-02 06:02:36
|
Yes, synchronous streaming replication is available in PGXC too. It is available for HA now. In terms of hot standby, I don't think it makes much sense in the current XC because each datanode backup may have different visibility, due to different duration of WAL playback and we may not have consistent result from all the datanode backups. Regards; ---------- Koichi Suzuki 2012/3/2 Ashutosh Bapat <ash...@en...>: > Hi David, > I had heard about this partitioning in conference in OSI Days, Bangalore. As > Suzuki-san said, this is complicated thing and has planner/transaction > implications. I think, it will take some time till we reach there. > > But, your motivation looks like unavailability of a partition. Can we use > replication with Hot standby for this purpose? > > > > > On Fri, Mar 2, 2012 at 6:43 AM, David E. Wheeler <da...@ju...> > wrote: >> >> PGXCers, >> >> I’m wondering if there is some way to have data in a table partitioned >> among groups of nodes. Right now, I can have data partitioned to, say, three >> data nodes. >> >> CREATE TABLE foo (id integer PRIMARY KEY, name text) >> DISTRIBUTE BY HASH(id) TO dn1, dn2, dn3; >> >> However, if one of them gets its power pulled, the data goes away until >> the node comes back up. >> >> What I’d like to be able to do is to have data partitioned to *groups* of >> nodes. Something like this: >> >> CREATE NODE GROUP dng1 WITH dn1a, dn1b, dn1c; >> CREATE NODE GROUP dng2 WITH dn2a, dn2b, dn2c; >> CREATE NODE GROUP dng3 WITH dn3a, dn3b, dn3c; >> >> Then create the table like this: >> >> CREATE TABLE foo (id integer PRIMARY KEY, name text) >> DISTRIBUTE BY HASH(id) TO dng1, dng2, dng3; >> >> So data written to dng1 would be on all three data nodes in that group. >> Data written to dng2 would be on all three data nodes in that group. And >> likewise for dng3. So if I insert: >> >> INSERT INTO foo(1, 'hi'); >> >> Then that data might be written to dng1, and so be on dn1a, dn1b, and >> dn1c. Another row: >> >> INSERT INTO foo(2, 'yo'); >> >> Might be written to dng2, and so be on dn2a, dn2b, and dn2c; >> >> Essentially, the data is partitioned to groups, and those groups are >> themselves replicated (using two-phase commit, I guess?). The advantage then >> is, if one of those nodes were to go away, the data would still be there. >> That is, if I shut down dn2a, then do >> >> SELECT * FROM foo WHERE id = 2; >> >> It would still return the data, from either dn2b or dn2c. >> >> Is this do-able with Postgres-XC? If not, it something like this planned? >> >> Thanks, >> >> David >> >> >> >> ------------------------------------------------------------------------------ >> Virtualization & Cloud Management Using Capacity Planning >> Cloud computing makes use of virtualization - but cloud computing >> also focuses on allowing computing to be delivered as a service. >> http://www.accelacomm.com/jaw/sfnl/114/51521223/ >> _______________________________________________ >> Postgres-xc-developers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > > > ------------------------------------------------------------------------------ > Virtualization & Cloud Management Using Capacity Planning > Cloud computing makes use of virtualization - but cloud computing > also focuses on allowing computing to be delivered as a service. > http://www.accelacomm.com/jaw/sfnl/114/51521223/ > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > |
From: David E. W. <da...@ju...> - 2012-03-02 05:59:50
|
On Mar 1, 2012, at 9:56 PM, Ashutosh Bapat wrote: > I had heard about this partitioning in conference in OSI Days, Bangalore. > As Suzuki-san said, this is complicated thing and has planner/transaction > implications. I think, it will take some time till we reach there. > > But, your motivation looks like unavailability of a partition. Can we use > replication with Hot standby for this purpose? I’m thinking of using PL/Proxy and building my own tools to copy database shards around using trigger-based replication. But it’s a nascent thought, subject to change in the face of experience. David |
From: Ashutosh B. <ash...@en...> - 2012-03-02 05:56:39
|
Hi David, I had heard about this partitioning in conference in OSI Days, Bangalore. As Suzuki-san said, this is complicated thing and has planner/transaction implications. I think, it will take some time till we reach there. But, your motivation looks like unavailability of a partition. Can we use replication with Hot standby for this purpose? On Fri, Mar 2, 2012 at 6:43 AM, David E. Wheeler <da...@ju...>wrote: > PGXCers, > > I’m wondering if there is some way to have data in a table partitioned > among groups of nodes. Right now, I can have data partitioned to, say, > three data nodes. > > CREATE TABLE foo (id integer PRIMARY KEY, name text) > DISTRIBUTE BY HASH(id) TO dn1, dn2, dn3; > > However, if one of them gets its power pulled, the data goes away until > the node comes back up. > > What I’d like to be able to do is to have data partitioned to *groups* of > nodes. Something like this: > > CREATE NODE GROUP dng1 WITH dn1a, dn1b, dn1c; > CREATE NODE GROUP dng2 WITH dn2a, dn2b, dn2c; > CREATE NODE GROUP dng3 WITH dn3a, dn3b, dn3c; > > Then create the table like this: > > CREATE TABLE foo (id integer PRIMARY KEY, name text) > DISTRIBUTE BY HASH(id) TO dng1, dng2, dng3; > > So data written to dng1 would be on all three data nodes in that group. > Data written to dng2 would be on all three data nodes in that group. And > likewise for dng3. So if I insert: > > INSERT INTO foo(1, 'hi'); > > Then that data might be written to dng1, and so be on dn1a, dn1b, and > dn1c. Another row: > > INSERT INTO foo(2, 'yo'); > > Might be written to dng2, and so be on dn2a, dn2b, and dn2c; > > Essentially, the data is partitioned to groups, and those groups are > themselves replicated (using two-phase commit, I guess?). The advantage > then is, if one of those nodes were to go away, the data would still be > there. That is, if I shut down dn2a, then do > > SELECT * FROM foo WHERE id = 2; > > It would still return the data, from either dn2b or dn2c. > > Is this do-able with Postgres-XC? If not, it something like this planned? > > Thanks, > > David > > > > ------------------------------------------------------------------------------ > Virtualization & Cloud Management Using Capacity Planning > Cloud computing makes use of virtualization - but cloud computing > also focuses on allowing computing to be delivered as a service. > http://www.accelacomm.com/jaw/sfnl/114/51521223/ > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: David E. W. <da...@ju...> - 2012-03-02 05:55:45
|
On Mar 1, 2012, at 8:18 PM, Koichi Suzuki wrote: > I remember that Jan Wieck made the same proposal in CHAR(11), to > choose suitable distribution for a query. This will bring > interesting topics to the planner. I don’t think it has to, not in a first cut, anyway. It just needs to pick one. For my current project, I will be making sure that all the data needed is on a single node (or its clone(s)), so the query can just be pushed down to the node’s planner. > On the other hand, this brings distributed table similar > characteristics as replicated tables. So far, we have some > restrictions on replicated tables, for example, cursor and select for > update. Multi-distributed table may have the same properties. Yes, of course. > I knew this will be worth thinking from the point of distributed query > plan. I don't know if this helps for availability because all the > objects in the database must be ready for this and it is a bit touch > to ask DBAs to do this completely. Instead, for availability, I > think synchronous streaming replication will be better solution. That adds a different kind of complexity, though. As an data architect who basically needs to be able to design a cluster that looks like a single database (which XC’s coordinator/node model does a nice job of) yet has distributed redundancy akin to RAID (which XC does not have), I am interested in a system that is just designed to work like that, to provide the interfaces I need. I feel like XC is close to that syntactically, given the various DISTRIBUTE BY attributes. But it is not yet designed for failure (no duplicate data except duplicated on *all* nodes; no hot-swappable nodes). I would still have to work as hard as I can to keep nodes up and running, when what I want is to be able to have nodes fail and not care (at least not urgently). Anyway, I realize I’m asking for something that is non-trivial. I ask only be be sure that I didn’t miss something so far. There’s a lot of promise here. I can’t wait to see how it all shakes out. Best, David |
From: Koichi S. <koi...@gm...> - 2012-03-02 04:18:18
|
I remember that Jan Wieck made the same proposal in CHAR(11), to choose suitable distribution for a query. This will bring interesting topics to the planner. On the other hand, this brings distributed table similar characteristics as replicated tables. So far, we have some restrictions on replicated tables, for example, cursor and select for update. Multi-distributed table may have the same properties. I knew this will be worth thinking from the point of distributed query plan. I don't know if this helps for availability because all the objects in the database must be ready for this and it is a bit touch to ask DBAs to do this completely. Instead, for availability, I think synchronous streaming replication will be better solution. Regards; ---------- Koichi Suzuki 2012/3/2 David E. Wheeler <da...@ju...>: > PGXCers, > > I’m wondering if there is some way to have data in a table partitioned among groups of nodes. Right now, I can have data partitioned to, say, three data nodes. > > CREATE TABLE foo (id integer PRIMARY KEY, name text) > DISTRIBUTE BY HASH(id) TO dn1, dn2, dn3; > > However, if one of them gets its power pulled, the data goes away until the node comes back up. > > What I’d like to be able to do is to have data partitioned to *groups* of nodes. Something like this: > > CREATE NODE GROUP dng1 WITH dn1a, dn1b, dn1c; > CREATE NODE GROUP dng2 WITH dn2a, dn2b, dn2c; > CREATE NODE GROUP dng3 WITH dn3a, dn3b, dn3c; > > Then create the table like this: > > CREATE TABLE foo (id integer PRIMARY KEY, name text) > DISTRIBUTE BY HASH(id) TO dng1, dng2, dng3; > > So data written to dng1 would be on all three data nodes in that group. Data written to dng2 would be on all three data nodes in that group. And likewise for dng3. So if I insert: > > INSERT INTO foo(1, 'hi'); > > Then that data might be written to dng1, and so be on dn1a, dn1b, and dn1c. Another row: > > INSERT INTO foo(2, 'yo'); > > Might be written to dng2, and so be on dn2a, dn2b, and dn2c; > > Essentially, the data is partitioned to groups, and those groups are themselves replicated (using two-phase commit, I guess?). The advantage then is, if one of those nodes were to go away, the data would still be there. That is, if I shut down dn2a, then do > > SELECT * FROM foo WHERE id = 2; > > It would still return the data, from either dn2b or dn2c. > > Is this do-able with Postgres-XC? If not, it something like this planned? > > Thanks, > > David > > > ------------------------------------------------------------------------------ > Virtualization & Cloud Management Using Capacity Planning > Cloud computing makes use of virtualization - but cloud computing > also focuses on allowing computing to be delivered as a service. > http://www.accelacomm.com/jaw/sfnl/114/51521223/ > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers |
From: Michael P. <mic...@gm...> - 2012-03-02 04:09:05
|
On Fri, Mar 2, 2012 at 10:13 AM, David E. Wheeler <da...@ju...>wrote: > PGXCers, > > I’m wondering if there is some way to have data in a table partitioned > among groups of nodes. Right now, I can have data partitioned to, say, > three data nodes. > > CREATE TABLE foo (id integer PRIMARY KEY, name text) > DISTRIBUTE BY HASH(id) TO dn1, dn2, dn3; > > However, if one of them gets its power pulled, the data goes away until > the node comes back up. > > What I’d like to be able to do is to have data partitioned to *groups* of > nodes. Something like this: > > CREATE NODE GROUP dng1 WITH dn1a, dn1b, dn1c; > CREATE NODE GROUP dng2 WITH dn2a, dn2b, dn2c; > CREATE NODE GROUP dng3 WITH dn3a, dn3b, dn3c; > > Then create the table like this: > > CREATE TABLE foo (id integer PRIMARY KEY, name text) > DISTRIBUTE BY HASH(id) TO dng1, dng2, dng3; > > So data written to dng1 would be on all three data nodes in that group. > Data written to dng2 would be on all three data nodes in that group. And > likewise for dng3. So if I insert: > > INSERT INTO foo(1, 'hi'); > > Then that data might be written to dng1, and so be on dn1a, dn1b, and > dn1c. Another row: > > INSERT INTO foo(2, 'yo'); > > Might be written to dng2, and so be on dn2a, dn2b, and dn2c; > > Essentially, the data is partitioned to groups, and those groups are > themselves replicated (using two-phase commit, I guess?). The advantage > then is, if one of those nodes were to go away, the data would still be > there. That is, if I shut down dn2a, then do > > SELECT * FROM foo WHERE id = 2; > > It would still return the data, from either dn2b or dn2c. > > Is this do-able with Postgres-XC? > Ashutosh suggested something about implimenting complicated partitioning, in XC which is to extend the existing mechanisms of PostgreSQL. I don't know how much work it would need though. So this is do-able. > If not, it something like this planned? > After 1.0, as our main concern will be node addition and removal, this is not on the top-priority Regards, -- Michael Paquier http://michael.otacoo.com |
From: Pavan D. <pav...@en...> - 2012-03-02 03:52:18
|
On Fri, Mar 2, 2012 at 6:52 AM, David E. Wheeler <da...@ju...> wrote: > On Mar 1, 2012, at 5:21 PM, Michael Paquier wrote: > >> If you kick pgxc_pool_reload 100 times automatically at the end of create node, you will end up aborting everything more than necessary. A manual action with psql, or an external application kicking pgxc_pool_reload brings more flexibility. If you kick it automatically after create/alter/drop node you lose that flexibility. > > Yeah, but if you do it at the end of a transaction, then it’s implicit that you can do it on one call, with all your changes at once, without having to emit a warning. Perhaps emit instead a NOTICE if it’s done with autocommit. > Irrespective of whether we do it at the end of transaction or command, I agree with David that its useful if we can avoid that additional step of calling pgxc_pool_reload. Users would most likely forgot to do that and then either start seeing warnings or data consistency issues. I don't think node addition/removal is a very frequent activity, so we don't need to think too much about performance etc. Another related thought I had in mind is to do some sanity check while running the CREATE NODE command. May be we want to test if the a server (coordinator or data node) is running on the given host/port and flag an error if its not running. That might be useful to ensure a sane and consistent pooler state. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com |
From: David E. W. <da...@ju...> - 2012-03-02 01:22:20
|
On Mar 1, 2012, at 5:21 PM, Michael Paquier wrote: > If you kick pgxc_pool_reload 100 times automatically at the end of create node, you will end up aborting everything more than necessary. A manual action with psql, or an external application kicking pgxc_pool_reload brings more flexibility. If you kick it automatically after create/alter/drop node you lose that flexibility. Yeah, but if you do it at the end of a transaction, then it’s implicit that you can do it on one call, with all your changes at once, without having to emit a warning. Perhaps emit instead a NOTICE if it’s done with autocommit. Best, David |
From: Michael P. <mic...@gm...> - 2012-03-02 01:20:40
|
On 2012/03/02, at 10:10, "David E. Wheeler" <da...@ju...> wrote: > On Mar 1, 2012, at 5:04 PM, Michael Paquier wrote: > >>> If the answer is “Because you might be making a bunch of node changes at one time and want to reload them all at once,” then why not automatically call pgxc_pool_reload() at the end of the current transaction? >> >> Well, yes ;) >> And the fact that pgxc_pool_reload aborts all the transaction currently running on coordinator. Imagine that you setup a cluster with 100 nodes, that would be costly not only pooler, but in the whole server unnecessarily. > > I’m sorry, you lost me. pgxc_pool_reload() aborts all transactions on the coordinator you run it on. What has that to do with the other 99 nodes? If you kick pgxc_pool_reload 100 times automatically at the end of create node, you will end up aborting everything more than necessary. A manual action with psql, or an external application kicking pgxc_pool_reload brings more flexibility. If you kick it automatically after create/alter/drop node you lose that flexibility. > > Thanks, > > David > |
From: David E. W. <da...@ju...> - 2012-03-02 01:13:58
|
PGXCers, I’m wondering if there is some way to have data in a table partitioned among groups of nodes. Right now, I can have data partitioned to, say, three data nodes. CREATE TABLE foo (id integer PRIMARY KEY, name text) DISTRIBUTE BY HASH(id) TO dn1, dn2, dn3; However, if one of them gets its power pulled, the data goes away until the node comes back up. What I’d like to be able to do is to have data partitioned to *groups* of nodes. Something like this: CREATE NODE GROUP dng1 WITH dn1a, dn1b, dn1c; CREATE NODE GROUP dng2 WITH dn2a, dn2b, dn2c; CREATE NODE GROUP dng3 WITH dn3a, dn3b, dn3c; Then create the table like this: CREATE TABLE foo (id integer PRIMARY KEY, name text) DISTRIBUTE BY HASH(id) TO dng1, dng2, dng3; So data written to dng1 would be on all three data nodes in that group. Data written to dng2 would be on all three data nodes in that group. And likewise for dng3. So if I insert: INSERT INTO foo(1, 'hi'); Then that data might be written to dng1, and so be on dn1a, dn1b, and dn1c. Another row: INSERT INTO foo(2, 'yo'); Might be written to dng2, and so be on dn2a, dn2b, and dn2c; Essentially, the data is partitioned to groups, and those groups are themselves replicated (using two-phase commit, I guess?). The advantage then is, if one of those nodes were to go away, the data would still be there. That is, if I shut down dn2a, then do SELECT * FROM foo WHERE id = 2; It would still return the data, from either dn2b or dn2c. Is this do-able with Postgres-XC? If not, it something like this planned? Thanks, David |
From: David E. W. <da...@ju...> - 2012-03-02 01:11:09
|
On Mar 1, 2012, at 5:04 PM, Michael Paquier wrote: >> If the answer is “Because you might be making a bunch of node changes at one time and want to reload them all at once,” then why not automatically call pgxc_pool_reload() at the end of the current transaction? > > Well, yes ;) > And the fact that pgxc_pool_reload aborts all the transaction currently running on coordinator. Imagine that you setup a cluster with 100 nodes, that would be costly not only pooler, but in the whole server unnecessarily. I’m sorry, you lost me. pgxc_pool_reload() aborts all transactions on the coordinator you run it on. What has that to do with the other 99 nodes? Thanks, David |
From: Michael P. <mic...@gm...> - 2012-03-02 01:04:36
|
On 2012/03/02, at 9:36, "David E. Wheeler" <da...@ju...> wrote: > On Mar 1, 2012, at 3:57 PM, Michael Paquier wrote: > >> Well, this warning means that the connection information cached in pooler >> is not the same as what is inside the catalogs. >> What is potentially very dangerous because you could perform inconsistent >> operations in the cluster so a message is necessary I think. You absolutely >> need to make connection information cached consistent. >> >> Try also to connect to a server with inconsistent pooler data cached as >> non-superuser, you will see that server will return an error and close your >> connection. >> Then, perhaps a warning is too much exagerated for superuser, and a notice >> would be enough. >> But as it is a cluster-related message, I suppose not. >> Opinions? > > Well, why is it necessary? Why can it not call pgxc_pool_reload() itself whenever I add a node? > > If the answer is “Because you might be making a bunch of node changes at one time and want to reload them all at once,” then why not automatically call pgxc_pool_reload() at the end of the current transaction? Well, yes ;) And the fact that pgxc_pool_reload aborts all the transaction currently running on coordinator. Imagine that you setup a cluster with 100 nodes, that would be costly not only pooler, but in the whole server unnecessarily. > >> I also see a lot of DEBUG entries in the log, even though log_min_messages >>> is warning. Necessary? >>> I similarly see this when shutting down a node: >>> DEBUG: logger shutting down >> >> The message is a PostgreSQL message. With log_min_messages, you shouldn't >> see those messages on server logs as you say. Do you see them on client >> side? Perhaps client_min_messages is set at debug. > > Oh, so it is, thanks to the OPTIONS stuff I copied from your sample shell script. Duh. > > Thanks, > > David > > |
From: David E. W. <da...@ju...> - 2012-03-02 00:36:29
|
On Mar 1, 2012, at 3:57 PM, Michael Paquier wrote: > Well, this warning means that the connection information cached in pooler > is not the same as what is inside the catalogs. > What is potentially very dangerous because you could perform inconsistent > operations in the cluster so a message is necessary I think. You absolutely > need to make connection information cached consistent. > > Try also to connect to a server with inconsistent pooler data cached as > non-superuser, you will see that server will return an error and close your > connection. > Then, perhaps a warning is too much exagerated for superuser, and a notice > would be enough. > But as it is a cluster-related message, I suppose not. > Opinions? Well, why is it necessary? Why can it not call pgxc_pool_reload() itself whenever I add a node? If the answer is “Because you might be making a bunch of node changes at one time and want to reload them all at once,” then why not automatically call pgxc_pool_reload() at the end of the current transaction? > I also see a lot of DEBUG entries in the log, even though log_min_messages >> is warning. Necessary? >> I similarly see this when shutting down a node: >> DEBUG: logger shutting down > > The message is a PostgreSQL message. With log_min_messages, you shouldn't > see those messages on server logs as you say. Do you see them on client > side? Perhaps client_min_messages is set at debug. Oh, so it is, thanks to the OPTIONS stuff I copied from your sample shell script. Duh. Thanks, David |