postgres-xc-developers Mailing List for Postgres-XC

Brought to you by: ahsanhadi, amitdkhan, ashutoshbapat, gabbasb, and 3 others

postgres-xc-developers — Postgres-XC hackers and developers

You can subscribe to this list here.

2010	Jan	Feb	Mar	Apr (10)	May (17)	Jun (3)	Jul	Aug	Sep (8)	Oct (18)	Nov (51)	Dec (74)
2011	Jan (47)	Feb (44)	Mar (44)	Apr (102)	May (35)	Jun (25)	Jul (56)	Aug (69)	Sep (32)	Oct (37)	Nov (31)	Dec (16)
2012	Jan (34)	Feb (127)	Mar (218)	Apr (252)	May (80)	Jun (137)	Jul (205)	Aug (159)	Sep (35)	Oct (50)	Nov (82)	Dec (52)
2013	Jan (107)	Feb (159)	Mar (118)	Apr (163)	May (151)	Jun (89)	Jul (106)	Aug (177)	Sep (49)	Oct (63)	Nov (46)	Dec (7)
2014	Jan (65)	Feb (128)	Mar (40)	Apr (11)	May (4)	Jun (8)	Jul (16)	Aug (11)	Sep (4)	Oct (1)	Nov (5)	Dec (16)
2015	Jan (5)	Feb	Mar (2)	Apr (5)	May (4)	Jun (12)	Jul	Aug	Sep	Oct	Nov	Dec (4)
2019	Jan	Feb	Mar	Apr	May	Jun	Jul (2)	Aug	Sep	Oct	Nov	Dec

S	M	T	W	T	F	S
			1 (11)	2 (5)	3 (2)	4
5	6 (1)	7	8	9 (1)	10 (20)	11
12	13 (26)	14 (3)	15 (1)	16 (12)	17 (8)	18
19 (3)	20 (18)	21 (11)	22 (6)	23 (14)	24 (3)	25
26	27 (7)	28 (1)	29	30 (6)	31

Flat | Threaded

1 2 3 .. 6 > >> (Page 1 of 6)

Re: [Postgres-xc-developers] TRIGGER implementation effort: full parameter support

From: Koichi S. <ko...@in...> - 2012-08-30 09:27:04

On Thu, 30 Aug 2012 17:18:48 +0900
Michael Paquier <mic...@gm...> wrote:

> On Thu, Aug 30, 2012 at 5:11 PM, Koichi Suzuki <ko...@in...>wrote:
> 
> > Hi, Michael;
> >
> > Thanks for the detailed research and consideration.   I think they mostly
> > work fine.   I have a couple of questions.
> > 1) What happens if trigger is not shippable and the target is replicated
> > table?  (I'm not sure if ther's such case
> > actually though).
> >
> For row-based triggers?

Yes.

> 
> 2) In the case of replicated tables, trigger functions may alter the NEW
> > tuples.   Are there any means to enforce
> > the integrity?
> >
> Do you mean ship the check on constraints to remote nodes?

I don't insist specific means.  If it works efficiently, it's okay.

> 
> > Or should trigger functions be responsible to maintain it?  (I think both
> > make sense.)
> >
> Do you mean that constraints need to be checked on Cordinator? controlled
> by Coordinator?

Whatever means to use, I think someone should enforce the integrity, not necessarily the core.

Replicated table needs specific (and careful) handling to maintain all the replica consists identical rows.   I understand it's not always easy to do in the core.    I'm wondering if we can enforce it when trigger functions alter the NEW row using volatile functions.    I also think it's acceptable to ask trigger functions to be responsible to maintain it.

I hope they are in your scope.

Regards;
> -- 
> Michael Paquier
> http://michael.otacoo.com

Re: [Postgres-xc-developers] TRIGGER implementation effort: full parameter support

From: Michael P. <mic...@gm...> - 2012-08-30 08:54:27

On Thu, Aug 30, 2012 at 5:32 PM, Ashutosh Bapat <
ash...@en...> wrote:

>
>
> On Thu, Aug 30, 2012 at 1:22 PM, Michael Paquier <
> mic...@gm...> wrote:
>
>> Hi all,
>>
>> I have spent some time doing a more precise analysis of the effort
>> necessary to support triggers, and here are my conclusions.
>> 1) statement-based triggers do not require that much effort. They do not
>> use tuple data so they can work natively.
>> 2) we need a mechanism to block non-shippable triggers from being fired
>> on Datanodes. If a trigger has been fired already on Coordinator, it makes
>> no sense to have a trigger firing on Datanodes. The trigger definition is
>> available on all nodes, so it is easy to evaluate if the trigger was
>> shipped or not based on the evaluation done on Coordinator. i already got a
>> mechanism implemented to evaluate the shippability of triggers, so this
>> will not require that much effort to extend that for Datanodes.
>> 3) Based on the list of parameters that can be used inside triggers =>
>> http://www.postgresql.org/docs/9.1/static/plpgsql-trigger.html, all the
>> parameters will work natively even if the trigger is fired on Coordinator
>> and on Datanode. Only 2 parameters need special effort: OLD and NEW. They
>> specify the old and new tuples before and after the trigger is fired. The
>> tuple data is basically fetched using the CTID of tuple, but in the case of
>> XC this is not enough, we need to couple the CTID withe xc_node_id to get
>> the correct tuple from the correct remote node when trigger is fired on
>> Coordinator
>> 4) About constraint triggers, which is a trigger that can be deferred.
>> With what I understood from the code, if a trigger is deferred all the
>> CTIDs of the concerned tuples are saved and then used at the same time when
>> trigger is fired in a common event. With the CTIDs gathered, vanilla gets
>> the necessary tuples from disk and then fire the triggers. In the case of
>> XC there are 2 problems:
>> 4-1) TIDs are not enough, we need to couple them with xc_node_id to get
>> unique tuple from remote nodes.
>> 4-2) We need to get the tuples from remote nodes depending on relation
>> distribution.
>> Well, this is related to the parameters OLD and NEW, we need a way to
>> gather tuples from remote nodes. A constraint trigger can be only of the
>> type AFTER ROW.
>> 5) About row-based triggers, this is related to the need of extension for
>> parameters OLD and NEW.
>> When a row-based trigger is invocated we use the CTID to get the data of
>> the OLD or NEW tuple with GetTupleForTrigger in trigger.c. This is used by
>> UPDATE and DELETE triggers for BEFORE and AFTER. In the case of UPDATE the
>> data is already present so no worries.
>>
>> So the main problem is fetching tuples from remote nodes when firing a
>> trigger.
>> It is necessary to create something that is XC-specific to handle that.
>> In vanilla, what happens is that we simply fetch  the tuple from disk
>> with the CTID, make a copy of it, and then Store the modified tuple with
>> ExecStoreTuple using the same slot. But in the case of XC the tuple slot is
>> on the remote node, as well as the tuple data.
>> So, here is what I propose to solve that.
>>
>> For old tuple fetching, I am thinking about using the exising COPY TO
>> mechanism integrated with tuplestore that has been used for table
>> redistribution.
>> When fetching a tuple with given CTID, we use a query of the type "COPY
>> (SELECT * FROM table where ctid = $ctid) TO stdout" and send it to the
>> necessary node.
>> The result is saved in tuplestore and send back to the executor for
>> treatment.
>> To store the new tuple, it am thinking about materializing the tuple, and
>> then generate an UPDATE query that will be sent to the correct remote node
>> to update the tuple correctly.
>> A query of the type: "UPDATE table SET col1= $col1... WHERE ctid = $ctid".
>> As a limitation, it will not be possible to update the distribution
>> column of a tuple inside a trigger... This would mean that tuple has to be
>> moved from a node to another.
>> Well, XC already has this limitation, but I honestly think we can live
>> with that.
>>
>> So, for the implementation I was thinking about 2 steps:
>> 1) Implement the shippability evaluation for triggers, necessary to
>> decide if a trigger can be shipped to a remote node or not, and necessary
>> to verify on a remote node if a trigger has already been fired or not from
>> a Coordinator. This step will add support for statement-based triggers for
>> INSERT, UPDATE, DELETE and TRUNCATE.
>> CONSTRAINT triggers are not included in this step as they can be only of
>> type AFTER ROW
>> After this step is done, test cases provided of course, all the
>> shippability mechanism will be in place.
>> The row-based triggers for INSERT will also work at this step, but still
>> the target here is only statement-based triggers, so they will be blocked.
>>
>> 2) Implement the row-based mechanism. As mentionned above, this is
>> necessary to fetch data from OLD tuples and update the NEW tuples to remote
>> nodes.
>> At this step all the types of triggers will be implemented. It might be
>> necessary to implement some extra things for constraint triggers but just
>> by looking at the code I don't think so.
>>
>
> I have looked at this code, and possibly you can use the same mechanism as
> what Abbas is using for RETURNING.
>
Yes you are right it might be possible. I do not exclude a method or
another for row-based triggers. And using a maximum things in place will
reduce effort.
Let's decide that once the statement-based triggers are in with the
shippability mechanisms.
-- 
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-developers] TRIGGER implementation effort: full parameter support

From: Ashutosh B. <ash...@en...> - 2012-08-30 08:32:37

On Thu, Aug 30, 2012 at 1:22 PM, Michael Paquier
<mic...@gm...>wrote:

> Hi all,
>
> I have spent some time doing a more precise analysis of the effort
> necessary to support triggers, and here are my conclusions.
> 1) statement-based triggers do not require that much effort. They do not
> use tuple data so they can work natively.
> 2) we need a mechanism to block non-shippable triggers from being fired on
> Datanodes. If a trigger has been fired already on Coordinator, it makes no
> sense to have a trigger firing on Datanodes. The trigger definition is
> available on all nodes, so it is easy to evaluate if the trigger was
> shipped or not based on the evaluation done on Coordinator. i already got a
> mechanism implemented to evaluate the shippability of triggers, so this
> will not require that much effort to extend that for Datanodes.
> 3) Based on the list of parameters that can be used inside triggers =>
> http://www.postgresql.org/docs/9.1/static/plpgsql-trigger.html, all the
> parameters will work natively even if the trigger is fired on Coordinator
> and on Datanode. Only 2 parameters need special effort: OLD and NEW. They
> specify the old and new tuples before and after the trigger is fired. The
> tuple data is basically fetched using the CTID of tuple, but in the case of
> XC this is not enough, we need to couple the CTID withe xc_node_id to get
> the correct tuple from the correct remote node when trigger is fired on
> Coordinator
> 4) About constraint triggers, which is a trigger that can be deferred.
> With what I understood from the code, if a trigger is deferred all the
> CTIDs of the concerned tuples are saved and then used at the same time when
> trigger is fired in a common event. With the CTIDs gathered, vanilla gets
> the necessary tuples from disk and then fire the triggers. In the case of
> XC there are 2 problems:
> 4-1) TIDs are not enough, we need to couple them with xc_node_id to get
> unique tuple from remote nodes.
> 4-2) We need to get the tuples from remote nodes depending on relation
> distribution.
> Well, this is related to the parameters OLD and NEW, we need a way to
> gather tuples from remote nodes. A constraint trigger can be only of the
> type AFTER ROW.
> 5) About row-based triggers, this is related to the need of extension for
> parameters OLD and NEW.
> When a row-based trigger is invocated we use the CTID to get the data of
> the OLD or NEW tuple with GetTupleForTrigger in trigger.c. This is used by
> UPDATE and DELETE triggers for BEFORE and AFTER. In the case of UPDATE the
> data is already present so no worries.
>
> So the main problem is fetching tuples from remote nodes when firing a
> trigger.
> It is necessary to create something that is XC-specific to handle that.
> In vanilla, what happens is that we simply fetch  the tuple from disk with
> the CTID, make a copy of it, and then Store the modified tuple with
> ExecStoreTuple using the same slot. But in the case of XC the tuple slot is
> on the remote node, as well as the tuple data.
> So, here is what I propose to solve that.
>
> For old tuple fetching, I am thinking about using the exising COPY TO
> mechanism integrated with tuplestore that has been used for table
> redistribution.
> When fetching a tuple with given CTID, we use a query of the type "COPY
> (SELECT * FROM table where ctid = $ctid) TO stdout" and send it to the
> necessary node.
> The result is saved in tuplestore and send back to the executor for
> treatment.
> To store the new tuple, it am thinking about materializing the tuple, and
> then generate an UPDATE query that will be sent to the correct remote node
> to update the tuple correctly.
> A query of the type: "UPDATE table SET col1= $col1... WHERE ctid = $ctid".
> As a limitation, it will not be possible to update the distribution column
> of a tuple inside a trigger... This would mean that tuple has to be moved
> from a node to another.
> Well, XC already has this limitation, but I honestly think we can live
> with that.
>
> So, for the implementation I was thinking about 2 steps:
> 1) Implement the shippability evaluation for triggers, necessary to decide
> if a trigger can be shipped to a remote node or not, and necessary to
> verify on a remote node if a trigger has already been fired or not from a
> Coordinator. This step will add support for statement-based triggers for
> INSERT, UPDATE, DELETE and TRUNCATE.
> CONSTRAINT triggers are not included in this step as they can be only of
> type AFTER ROW
> After this step is done, test cases provided of course, all the
> shippability mechanism will be in place.
> The row-based triggers for INSERT will also work at this step, but still
> the target here is only statement-based triggers, so they will be blocked.
>
> 2) Implement the row-based mechanism. As mentionned above, this is
> necessary to fetch data from OLD tuples and update the NEW tuples to remote
> nodes.
> At this step all the types of triggers will be implemented. It might be
> necessary to implement some extra things for constraint triggers but just
> by looking at the code I don't think so.
>

I have looked at this code, and possibly you can use the same mechanism as
what Abbas is using for RETURNING.


>
> So, I will first focus on step 1, which is support for statement-based
> triggers.
> Once step 1 is done, row-based triggers (step 2) will be done. The
> mechanism of step 2 is related to global constraint, as we need to be able
> to fetch tuples from remote nodes to check constraints on Coordinators fi
> necessary.
>
> Regards,
>
>
> On Mon, Aug 27, 2012 at 10:36 PM, Michael Paquier <
> mic...@gm...> wrote:
>
>>
>>
>> On Mon, Aug 27, 2012 at 9:23 PM, Ashutosh Bapat <
>> ash...@en...> wrote:
>>
>>>
>>>
>>> On Fri, Aug 24, 2012 at 9:35 AM, Michael Paquier <
>>> mic...@gm...> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I am currently looking at the implementation of triggers and designed
>>>> some simple test sets that I am planning to extend for multiple table types.
>>>> Those tests contains I think the most commonly used triggers used in PG
>>>> based on the events (INSERT, UPDATE, DELETE, nothing for TRUNCATE yet) and
>>>> the firing moments (BEFORE, AFTER, INSTEAD OF).
>>>> You can find them attached, they run correctly on vanilla.
>>>>
>>>> Btw, I already designed and implemented some code allowing to evaluate
>>>> if a trigger can be safely shipped to Datanodes, and also some code to
>>>> create, drop and rename triggers safely in cluster.
>>>> My code is published here (don't mind it being public):
>>>> https://github.com/michaelpq/pgxc/tree/trigger.
>>>> Those are the basics I think, and I already coded them.
>>>>
>>>> For the time being, a trigger is defined a shippable to remote nodes if
>>>> it satisfies all those conditions:
>>>> - the query invocating it is entirely shippable, former FQS code, no
>>>> nothing to do
>>>> - the procedure launched is shippable, code already implemented
>>>> - the trigger is of type BEFORE/AFTER (INSTEAD OF can only be defined
>>>> on views, and views are only available on Coordinators), code already
>>>> implemented
>>>> - event is of type INSERT/DELETE/UPDATE. Events on TRUNCATE need to be
>>>> launched at Coordinator to keep global control. (Events triggers in 9.3
>>>> will be the same), code already implemented
>>>>
>>>> Then, in order to fundamentaly being able to support all the types of
>>>> triggers, 2 things are needed:
>>>> 1) Better implementation support for parameters, particularly cached
>>>> plans and functions
>>>> Now when a trigger is launched it fails with the following error:
>>>> postgres=# INSERT INTO my_table VALUES (1,2);
>>>> ERROR:  cache lookup failed for type 0
>>>> CONTEXT:  SQL statement "UPDATE table_stats SET num_insert_row =
>>>> num_insert_row + 1  WHERE table_name = TG_TABLE_NAME"
>>>> PL/pgSQL function count_insert_row() line 3 at SQL statement
>>>> This means that the parameters used inside the executed query are not
>>>> set properly in the cached plan used by the plpgsql function invocated by
>>>> trigger
>>>>
>>>
>>> Where are parameters involved here? Please provide full testcase. The
>>> trigger definitions are missing.
>>>
>> The test cases are attached.
>> You also need to fetch the code from this branch:
>> https://github.com/michaelpq/pgxc/tree/trigger
>>
>> You will see why fixing the parameters first is really important...
>>
>>
>>>
>>>> 2) Implement an alternative to GetTupleForTrigger in trigger.c which is
>>>> cluster-wide. GetTupleForTrigger is used by AFTER ROW triggers to get a
>>>> tuple to be modified from disk. This API is used to get a local tuple, but
>>>> in the case of XC we need something allowing to fetch a tuple from a remote
>>>> node. I am thinking about using some of the APIs in remotecopy.c to get
>>>> easily this information from remote nodes.
>>>>
>>>> Point 1) is really important as it currently impacts several tests
>>>> cases (plpgsql, plancache, rangefuncs: issue 3553038 in tracker), so I will
>>>> work on that first and submit a patch only for that.
>>>> Once 1) is done, 2) will include a more complete trigger implementation
>>>> with the FQS code I already wrote and the alternative to GetTupleForTrigger
>>>> (called GetRemoteTupleForTrigger?).
>>>>
>>>> As far as I could see, those are the things I need to do to have a
>>>> fully implementation for triggers.
>>>> Please note that I might have forgotten things that I did not notice,
>>>> in consequence I will update my analysis once 1) is done.
>>>>
>>>> Regards.
>>>> --
>>>> Michael Paquier
>>>> http://michael.otacoo.com
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Live Security Virtual Conference
>>>> Exclusive live event will cover all the ways today's security and
>>>> threat landscape has changed and how IT managers can respond.
>>>> Discussions
>>>> will include endpoint security, mobile security and the latest in
>>>> malware
>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>>> _______________________________________________
>>>> Postgres-xc-developers mailing list
>>>> Pos...@li...
>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Wishes,
>>> Ashutosh Bapat
>>> EntepriseDB Corporation
>>> The Enterprise Postgres Company
>>>
>>>
>>
>>
>> --
>> Michael Paquier
>> http://michael.otacoo.com
>>
>
>
>
> --
> Michael Paquier
> http://michael.otacoo.com
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Postgres-xc-developers mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>
>


-- 
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Enterprise Postgres Company

Re: [Postgres-xc-developers] TRIGGER implementation effort: full parameter support

From: Michael P. <mic...@gm...> - 2012-08-30 08:18:56

On Thu, Aug 30, 2012 at 5:11 PM, Koichi Suzuki <ko...@in...>wrote:

> Hi, Michael;
>
> Thanks for the detailed research and consideration.   I think they mostly
> work fine.   I have a couple of questions.
> 1) What happens if trigger is not shippable and the target is replicated
> table?  (I'm not sure if ther's such case
> actually though).
>
For row-based triggers?

2) In the case of replicated tables, trigger functions may alter the NEW
> tuples.   Are there any means to enforce
> the integrity?
>
Do you mean ship the check on constraints to remote nodes?

> Or should trigger functions be responsible to maintain it?  (I think both
> make sense.)
>
Do you mean that constraints need to be checked on Cordinator? controlled
by Coordinator?
-- 
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-developers] TRIGGER implementation effort: full parameter support

From: Koichi S. <ko...@in...> - 2012-08-30 08:09:28

Hi, Michael;

Thanks for the detailed research and consideration.   I think they mostly work fine.   I have a couple of questions.
1) What happens if trigger is not shippable and the target is replicated table?  (I'm not sure if ther's such case
actually though). 
2) In the case of replicated tables, trigger functions may alter the NEW tuples.   Are there any means to enforce
the integrity?   Or should trigger functions be responsible to maintain it?  (I think both make sense.)

Regards;
---
Koichi

On Thu, 30 Aug 2012 16:52:15 +0900
Michael Paquier <mic...@gm...> wrote:

> Hi all,
> 
> I have spent some time doing a more precise analysis of the effort
> necessary to support triggers, and here are my conclusions.
> 1) statement-based triggers do not require that much effort. They do not
> use tuple data so they can work natively.
> 2) we need a mechanism to block non-shippable triggers from being fired on
> Datanodes. If a trigger has been fired already on Coordinator, it makes no
> sense to have a trigger firing on Datanodes. The trigger definition is
> available on all nodes, so it is easy to evaluate if the trigger was
> shipped or not based on the evaluation done on Coordinator. i already got a
> mechanism implemented to evaluate the shippability of triggers, so this
> will not require that much effort to extend that for Datanodes.
> 3) Based on the list of parameters that can be used inside triggers =>
> http://www.postgresql.org/docs/9.1/static/plpgsql-trigger.html, all the
> parameters will work natively even if the trigger is fired on Coordinator
> and on Datanode. Only 2 parameters need special effort: OLD and NEW. They
> specify the old and new tuples before and after the trigger is fired. The
> tuple data is basically fetched using the CTID of tuple, but in the case of
> XC this is not enough, we need to couple the CTID withe xc_node_id to get
> the correct tuple from the correct remote node when trigger is fired on
> Coordinator
> 4) About constraint triggers, which is a trigger that can be deferred. With
> what I understood from the code, if a trigger is deferred all the CTIDs of
> the concerned tuples are saved and then used at the same time when trigger
> is fired in a common event. With the CTIDs gathered, vanilla gets the
> necessary tuples from disk and then fire the triggers. In the case of XC
> there are 2 problems:
> 4-1) TIDs are not enough, we need to couple them with xc_node_id to get
> unique tuple from remote nodes.
> 4-2) We need to get the tuples from remote nodes depending on relation
> distribution.
> Well, this is related to the parameters OLD and NEW, we need a way to
> gather tuples from remote nodes. A constraint trigger can be only of the
> type AFTER ROW.
> 5) About row-based triggers, this is related to the need of extension for
> parameters OLD and NEW.
> When a row-based trigger is invocated we use the CTID to get the data of
> the OLD or NEW tuple with GetTupleForTrigger in trigger.c. This is used by
> UPDATE and DELETE triggers for BEFORE and AFTER. In the case of UPDATE the
> data is already present so no worries.
> 
> So the main problem is fetching tuples from remote nodes when firing a
> trigger.
> It is necessary to create something that is XC-specific to handle that.
> In vanilla, what happens is that we simply fetch  the tuple from disk with
> the CTID, make a copy of it, and then Store the modified tuple with
> ExecStoreTuple using the same slot. But in the case of XC the tuple slot is
> on the remote node, as well as the tuple data.
> So, here is what I propose to solve that.
> 
> For old tuple fetching, I am thinking about using the exising COPY TO
> mechanism integrated with tuplestore that has been used for table
> redistribution.
> When fetching a tuple with given CTID, we use a query of the type "COPY
> (SELECT * FROM table where ctid = $ctid) TO stdout" and send it to the
> necessary node.
> The result is saved in tuplestore and send back to the executor for
> treatment.
> To store the new tuple, it am thinking about materializing the tuple, and
> then generate an UPDATE query that will be sent to the correct remote node
> to update the tuple correctly.
> A query of the type: "UPDATE table SET col1= $col1... WHERE ctid = $ctid".
> As a limitation, it will not be possible to update the distribution column
> of a tuple inside a trigger... This would mean that tuple has to be moved
> from a node to another.
> Well, XC already has this limitation, but I honestly think we can live with
> that.
> 
> So, for the implementation I was thinking about 2 steps:
> 1) Implement the shippability evaluation for triggers, necessary to decide
> if a trigger can be shipped to a remote node or not, and necessary to
> verify on a remote node if a trigger has already been fired or not from a
> Coordinator. This step will add support for statement-based triggers for
> INSERT, UPDATE, DELETE and TRUNCATE.
> CONSTRAINT triggers are not included in this step as they can be only of
> type AFTER ROW
> After this step is done, test cases provided of course, all the
> shippability mechanism will be in place.
> The row-based triggers for INSERT will also work at this step, but still
> the target here is only statement-based triggers, so they will be blocked.
> 
> 2) Implement the row-based mechanism. As mentionned above, this is
> necessary to fetch data from OLD tuples and update the NEW tuples to remote
> nodes.
> At this step all the types of triggers will be implemented. It might be
> necessary to implement some extra things for constraint triggers but just
> by looking at the code I don't think so.
> 
> So, I will first focus on step 1, which is support for statement-based
> triggers.
> Once step 1 is done, row-based triggers (step 2) will be done. The
> mechanism of step 2 is related to global constraint, as we need to be able
> to fetch tuples from remote nodes to check constraints on Coordinators fi
> necessary.
> 
> Regards,
> 
> On Mon, Aug 27, 2012 at 10:36 PM, Michael Paquier <mic...@gm...
> > wrote:
> 
> >
> >
> > On Mon, Aug 27, 2012 at 9:23 PM, Ashutosh Bapat <
> > ash...@en...> wrote:
> >
> >>
> >>
> >> On Fri, Aug 24, 2012 at 9:35 AM, Michael Paquier <
> >> mic...@gm...> wrote:
> >>
> >>> Hi all,
> >>>
> >>> I am currently looking at the implementation of triggers and designed
> >>> some simple test sets that I am planning to extend for multiple table types.
> >>> Those tests contains I think the most commonly used triggers used in PG
> >>> based on the events (INSERT, UPDATE, DELETE, nothing for TRUNCATE yet) and
> >>> the firing moments (BEFORE, AFTER, INSTEAD OF).
> >>> You can find them attached, they run correctly on vanilla.
> >>>
> >>> Btw, I already designed and implemented some code allowing to evaluate
> >>> if a trigger can be safely shipped to Datanodes, and also some code to
> >>> create, drop and rename triggers safely in cluster.
> >>> My code is published here (don't mind it being public):
> >>> https://github.com/michaelpq/pgxc/tree/trigger.
> >>> Those are the basics I think, and I already coded them.
> >>>
> >>> For the time being, a trigger is defined a shippable to remote nodes if
> >>> it satisfies all those conditions:
> >>> - the query invocating it is entirely shippable, former FQS code, no
> >>> nothing to do
> >>> - the procedure launched is shippable, code already implemented
> >>> - the trigger is of type BEFORE/AFTER (INSTEAD OF can only be defined on
> >>> views, and views are only available on Coordinators), code already
> >>> implemented
> >>> - event is of type INSERT/DELETE/UPDATE. Events on TRUNCATE need to be
> >>> launched at Coordinator to keep global control. (Events triggers in 9.3
> >>> will be the same), code already implemented
> >>>
> >>> Then, in order to fundamentaly being able to support all the types of
> >>> triggers, 2 things are needed:
> >>> 1) Better implementation support for parameters, particularly cached
> >>> plans and functions
> >>> Now when a trigger is launched it fails with the following error:
> >>> postgres=# INSERT INTO my_table VALUES (1,2);
> >>> ERROR:  cache lookup failed for type 0
> >>> CONTEXT:  SQL statement "UPDATE table_stats SET num_insert_row =
> >>> num_insert_row + 1  WHERE table_name = TG_TABLE_NAME"
> >>> PL/pgSQL function count_insert_row() line 3 at SQL statement
> >>> This means that the parameters used inside the executed query are not
> >>> set properly in the cached plan used by the plpgsql function invocated by
> >>> trigger
> >>>
> >>
> >> Where are parameters involved here? Please provide full testcase. The
> >> trigger definitions are missing.
> >>
> > The test cases are attached.
> > You also need to fetch the code from this branch:
> > https://github.com/michaelpq/pgxc/tree/trigger
> >
> > You will see why fixing the parameters first is really important...
> >
> >
> >>
> >>> 2) Implement an alternative to GetTupleForTrigger in trigger.c which is
> >>> cluster-wide. GetTupleForTrigger is used by AFTER ROW triggers to get a
> >>> tuple to be modified from disk. This API is used to get a local tuple, but
> >>> in the case of XC we need something allowing to fetch a tuple from a remote
> >>> node. I am thinking about using some of the APIs in remotecopy.c to get
> >>> easily this information from remote nodes.
> >>>
> >>> Point 1) is really important as it currently impacts several tests cases
> >>> (plpgsql, plancache, rangefuncs: issue 3553038 in tracker), so I will work
> >>> on that first and submit a patch only for that.
> >>> Once 1) is done, 2) will include a more complete trigger implementation
> >>> with the FQS code I already wrote and the alternative to GetTupleForTrigger
> >>> (called GetRemoteTupleForTrigger?).
> >>>
> >>> As far as I could see, those are the things I need to do to have a fully
> >>> implementation for triggers.
> >>> Please note that I might have forgotten things that I did not notice, in
> >>> consequence I will update my analysis once 1) is done.
> >>>
> >>> Regards.
> >>> --
> >>> Michael Paquier
> >>> http://michael.otacoo.com
> >>>
> >>>
> >>> ------------------------------------------------------------------------------
> >>> Live Security Virtual Conference
> >>> Exclusive live event will cover all the ways today's security and
> >>> threat landscape has changed and how IT managers can respond. Discussions
> >>> will include endpoint security, mobile security and the latest in malware
> >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >>> _______________________________________________
> >>> Postgres-xc-developers mailing list
> >>> Pos...@li...
> >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
> >>>
> >>>
> >>
> >>
> >> --
> >> Best Wishes,
> >> Ashutosh Bapat
> >> EntepriseDB Corporation
> >> The Enterprise Postgres Company
> >>
> >>
> >
> >
> > --
> > Michael Paquier
> > http://michael.otacoo.com
> >
> 
> 
> 
> -- 
> Michael Paquier
> http://michael.otacoo.com

Re: [Postgres-xc-developers] TRIGGER implementation effort: full parameter support

From: Michael P. <mic...@gm...> - 2012-08-30 07:52:28

Hi all,

I have spent some time doing a more precise analysis of the effort
necessary to support triggers, and here are my conclusions.
1) statement-based triggers do not require that much effort. They do not
use tuple data so they can work natively.
2) we need a mechanism to block non-shippable triggers from being fired on
Datanodes. If a trigger has been fired already on Coordinator, it makes no
sense to have a trigger firing on Datanodes. The trigger definition is
available on all nodes, so it is easy to evaluate if the trigger was
shipped or not based on the evaluation done on Coordinator. i already got a
mechanism implemented to evaluate the shippability of triggers, so this
will not require that much effort to extend that for Datanodes.
3) Based on the list of parameters that can be used inside triggers =>
http://www.postgresql.org/docs/9.1/static/plpgsql-trigger.html, all the
parameters will work natively even if the trigger is fired on Coordinator
and on Datanode. Only 2 parameters need special effort: OLD and NEW. They
specify the old and new tuples before and after the trigger is fired. The
tuple data is basically fetched using the CTID of tuple, but in the case of
XC this is not enough, we need to couple the CTID withe xc_node_id to get
the correct tuple from the correct remote node when trigger is fired on
Coordinator
4) About constraint triggers, which is a trigger that can be deferred. With
what I understood from the code, if a trigger is deferred all the CTIDs of
the concerned tuples are saved and then used at the same time when trigger
is fired in a common event. With the CTIDs gathered, vanilla gets the
necessary tuples from disk and then fire the triggers. In the case of XC
there are 2 problems:
4-1) TIDs are not enough, we need to couple them with xc_node_id to get
unique tuple from remote nodes.
4-2) We need to get the tuples from remote nodes depending on relation
distribution.
Well, this is related to the parameters OLD and NEW, we need a way to
gather tuples from remote nodes. A constraint trigger can be only of the
type AFTER ROW.
5) About row-based triggers, this is related to the need of extension for
parameters OLD and NEW.
When a row-based trigger is invocated we use the CTID to get the data of
the OLD or NEW tuple with GetTupleForTrigger in trigger.c. This is used by
UPDATE and DELETE triggers for BEFORE and AFTER. In the case of UPDATE the
data is already present so no worries.

So the main problem is fetching tuples from remote nodes when firing a
trigger.
It is necessary to create something that is XC-specific to handle that.
In vanilla, what happens is that we simply fetch  the tuple from disk with
the CTID, make a copy of it, and then Store the modified tuple with
ExecStoreTuple using the same slot. But in the case of XC the tuple slot is
on the remote node, as well as the tuple data.
So, here is what I propose to solve that.

For old tuple fetching, I am thinking about using the exising COPY TO
mechanism integrated with tuplestore that has been used for table
redistribution.
When fetching a tuple with given CTID, we use a query of the type "COPY
(SELECT * FROM table where ctid = $ctid) TO stdout" and send it to the
necessary node.
The result is saved in tuplestore and send back to the executor for
treatment.
To store the new tuple, it am thinking about materializing the tuple, and
then generate an UPDATE query that will be sent to the correct remote node
to update the tuple correctly.
A query of the type: "UPDATE table SET col1= $col1... WHERE ctid = $ctid".
As a limitation, it will not be possible to update the distribution column
of a tuple inside a trigger... This would mean that tuple has to be moved
from a node to another.
Well, XC already has this limitation, but I honestly think we can live with
that.

So, for the implementation I was thinking about 2 steps:
1) Implement the shippability evaluation for triggers, necessary to decide
if a trigger can be shipped to a remote node or not, and necessary to
verify on a remote node if a trigger has already been fired or not from a
Coordinator. This step will add support for statement-based triggers for
INSERT, UPDATE, DELETE and TRUNCATE.
CONSTRAINT triggers are not included in this step as they can be only of
type AFTER ROW
After this step is done, test cases provided of course, all the
shippability mechanism will be in place.
The row-based triggers for INSERT will also work at this step, but still
the target here is only statement-based triggers, so they will be blocked.

2) Implement the row-based mechanism. As mentionned above, this is
necessary to fetch data from OLD tuples and update the NEW tuples to remote
nodes.
At this step all the types of triggers will be implemented. It might be
necessary to implement some extra things for constraint triggers but just
by looking at the code I don't think so.

So, I will first focus on step 1, which is support for statement-based
triggers.
Once step 1 is done, row-based triggers (step 2) will be done. The
mechanism of step 2 is related to global constraint, as we need to be able
to fetch tuples from remote nodes to check constraints on Coordinators fi
necessary.

Regards,

On Mon, Aug 27, 2012 at 10:36 PM, Michael Paquier <mic...@gm...
> wrote:

>
>
> On Mon, Aug 27, 2012 at 9:23 PM, Ashutosh Bapat <
> ash...@en...> wrote:
>
>>
>>
>> On Fri, Aug 24, 2012 at 9:35 AM, Michael Paquier <
>> mic...@gm...> wrote:
>>
>>> Hi all,
>>>
>>> I am currently looking at the implementation of triggers and designed
>>> some simple test sets that I am planning to extend for multiple table types.
>>> Those tests contains I think the most commonly used triggers used in PG
>>> based on the events (INSERT, UPDATE, DELETE, nothing for TRUNCATE yet) and
>>> the firing moments (BEFORE, AFTER, INSTEAD OF).
>>> You can find them attached, they run correctly on vanilla.
>>>
>>> Btw, I already designed and implemented some code allowing to evaluate
>>> if a trigger can be safely shipped to Datanodes, and also some code to
>>> create, drop and rename triggers safely in cluster.
>>> My code is published here (don't mind it being public):
>>> https://github.com/michaelpq/pgxc/tree/trigger.
>>> Those are the basics I think, and I already coded them.
>>>
>>> For the time being, a trigger is defined a shippable to remote nodes if
>>> it satisfies all those conditions:
>>> - the query invocating it is entirely shippable, former FQS code, no
>>> nothing to do
>>> - the procedure launched is shippable, code already implemented
>>> - the trigger is of type BEFORE/AFTER (INSTEAD OF can only be defined on
>>> views, and views are only available on Coordinators), code already
>>> implemented
>>> - event is of type INSERT/DELETE/UPDATE. Events on TRUNCATE need to be
>>> launched at Coordinator to keep global control. (Events triggers in 9.3
>>> will be the same), code already implemented
>>>
>>> Then, in order to fundamentaly being able to support all the types of
>>> triggers, 2 things are needed:
>>> 1) Better implementation support for parameters, particularly cached
>>> plans and functions
>>> Now when a trigger is launched it fails with the following error:
>>> postgres=# INSERT INTO my_table VALUES (1,2);
>>> ERROR:  cache lookup failed for type 0
>>> CONTEXT:  SQL statement "UPDATE table_stats SET num_insert_row =
>>> num_insert_row + 1  WHERE table_name = TG_TABLE_NAME"
>>> PL/pgSQL function count_insert_row() line 3 at SQL statement
>>> This means that the parameters used inside the executed query are not
>>> set properly in the cached plan used by the plpgsql function invocated by
>>> trigger
>>>
>>
>> Where are parameters involved here? Please provide full testcase. The
>> trigger definitions are missing.
>>
> The test cases are attached.
> You also need to fetch the code from this branch:
> https://github.com/michaelpq/pgxc/tree/trigger
>
> You will see why fixing the parameters first is really important...
>
>
>>
>>> 2) Implement an alternative to GetTupleForTrigger in trigger.c which is
>>> cluster-wide. GetTupleForTrigger is used by AFTER ROW triggers to get a
>>> tuple to be modified from disk. This API is used to get a local tuple, but
>>> in the case of XC we need something allowing to fetch a tuple from a remote
>>> node. I am thinking about using some of the APIs in remotecopy.c to get
>>> easily this information from remote nodes.
>>>
>>> Point 1) is really important as it currently impacts several tests cases
>>> (plpgsql, plancache, rangefuncs: issue 3553038 in tracker), so I will work
>>> on that first and submit a patch only for that.
>>> Once 1) is done, 2) will include a more complete trigger implementation
>>> with the FQS code I already wrote and the alternative to GetTupleForTrigger
>>> (called GetRemoteTupleForTrigger?).
>>>
>>> As far as I could see, those are the things I need to do to have a fully
>>> implementation for triggers.
>>> Please note that I might have forgotten things that I did not notice, in
>>> consequence I will update my analysis once 1) is done.
>>>
>>> Regards.
>>> --
>>> Michael Paquier
>>> http://michael.otacoo.com
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Live Security Virtual Conference
>>> Exclusive live event will cover all the ways today's security and
>>> threat landscape has changed and how IT managers can respond. Discussions
>>> will include endpoint security, mobile security and the latest in malware
>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>> _______________________________________________
>>> Postgres-xc-developers mailing list
>>> Pos...@li...
>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>>>
>>>
>>
>>
>> --
>> Best Wishes,
>> Ashutosh Bapat
>> EntepriseDB Corporation
>> The Enterprise Postgres Company
>>
>>
>
>
> --
> Michael Paquier
> http://michael.otacoo.com
>

-- 
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-developers] Patch to handle stmts that cannot be run in transaction block

From: Michael P. <mic...@gm...> - 2012-08-28 14:06:41

Hi Amit,

I am looking at your patch.
Yes, I agree with the approach of using only callback functions and not
having the systen functions that might cause security issues. At least now
your functionality is transparent from user.

I got a couple of comments.
1) Please delete the whitespaces.
/Users/michael/Downloads/bug_3561969_stmts_outside_ts_block_.patch:156:
trailing whitespace.
/Users/michael/Downloads/bug_3561969_stmts_outside_ts_block_.patch:163:
space before tab in indent.
     UnlockSharedObject(DatabaseRelationId, db_id, 0, AccessExclusiveLock);
/Users/michael/Downloads/bug_3561969_stmts_outside_ts_block_.patch:201:
trailing whitespace.
#ifdef PGXC
/Users/michael/Downloads/bug_3561969_stmts_outside_ts_block_.patch:311:
trailing whitespace.
 * that we are removing are created by the same transaction, and are not
warning: 4 lines add whitespace errors.
2) You should remove 1 TAB when calling AtEOXact_DBCleanup(true) in xact.c,
the code is not correctly aligned.
3) For the regression test you are looking for, please create a plpgsql
function on the model of what is in xc_create_function.sql. There are
things already there to me transparent create/alter node operations
whatever the number of nodes. I would suggest something like:
CREATE OR REPLACE FUNCTION alter_table_change_nodes(tbspace_name text,
nodenum int[]) ...
This will create a tablespace only to the node listed in array nodenum.
What this node will do is simply get the node name for this node number and
launch:
EXECUTE DIRECT on (nodeN) 'CREATE TABLESPACE tbspace_name';

As an automatic test, call this function for the first node of cluster and
then recreate a tablespace with the same name.
With your patch tablespace creation will fail on node 1. Have a closer look
at alter_table_change_nodes and create_table_nodes to see how Abbas and I
we did to test XC features on sets of nodes.
4) I see this code in execRemote.c
+               if (!handle->error)
+               {
+                       int nodenum = PGXCNodeGetNodeId(handle->nodeoid,
node_type);
+                       if (!success_nodes)
+                               success_nodes = makeNode(ExecNodes);
+                       success_nodes->nodeList =
lappend_int(success_nodes->nodeList, nodenum);
+               }
+               else
+               {
+                       if (failednodes->len == 0)
+                               appendStringInfo(failednodes, "Error
message received from nodes:");
+                       appendStringInfo(failednodes, " %s",
get_pgxc_nodename(handle->nodeoid));
+               }
I have fundamently nothing against that, but just to say that if you are
going to add a test case to test this feature, you will be sure to get an
error message that is not consistent among clusters as it is based on the
node name. If it is possible, simply removing the context message will be
enough.
4) Could you add a comment on top of pgxc_all_success_nodes. You also do
not need the ":" after set_dbcleanup_callback: and AtEOXact_DBCleanup: in
there headers, something like that would be OK for clarity:
/*
 * $FUNCTIONNAME
 * $COMMENT
 */
When defining a function, the return type of the function is always on top
of the function name on a separate line, this is a postgresql convention :)

I also spent some time testing the feature, and well l haven't noticed
problems.
So, if you correct the minor problems in code and add the regression test
as a new set called for example xc_tablespace.
it will be OK.
As it will be a tablespace test, it will depend on a repository, so it will
be necessary to put it in src/test/regress/input.

Regards,

On Mon, Aug 27, 2012 at 2:57 PM, Amit Khandekar <
ami...@en...> wrote:

> In the earlier patch I had used xact abort callback functions to do
> the cleanup. Now in the new patch (attached)  even the *commit*
> calback function is used.
>
> So, in case of alter-database-set-tablespace, after the operation is
> successful in all nodes, the CommitTransaction() invokes the
> AtEOXact_DBCleanup() function (among other such functions). This
> ultimately causes the new function movedb_success_callback() to be
> called. This in turn does the original tablespace directory cleanup.
>
> This way, we don't have to explicitly send an on-success-cleanup
> function call from coordinator. It will happen on each individual node
> as a on-commit callback routine. So in effect, there is no need of the
> pg_rm_tablespacepath() function that I had defined in earlier patch. I
> have removed that code in this new patch.
>
> I am done with these changes now. This patch is for formal review. Bug
> id: 3561969.
>
> Statements supported through this patch are:
>
> CREATE DATABASE
> CREATE TABLESPACE
> ALTER DATABASE SET TABLESPACE
>
> Some more comments to Michael's comments are embedded inline below ...
>
> Regression
> --------------
>
> Unfortunately I could not come up with an automated regression test.
> The way this needs to be tested requires some method to abort the
> statement on *particular* node, not all nodes. I do this manually by
> creating some files in the new tablespace path of a node, so that the
> create-tablespace or alter-database errors out on that particular node
> due to presence of pre-existing files. We cannot dynamically determine
> this patch because it is made up of oids. So this I didn't manage to
> automate as part of regression test. If anyone has ideas, that is
> welcome.
>
> Recently something seems to have changed in my system after I
> reinstalled Ubuntu: the prepared_xact test has again started hanging
> in DROP TABLE. Also, xc_for_update is showing "node not defined"
> errors:
>   COMMIT PREPARED 'tbl_mytab1_locked';
> + ERROR:  PGXC Node COORD_1: object not defined
>
> All of this happens without my patch applied. Has anyone seen this
> lately? (If required, we will discuss this in another thread subject,
> not this mail thread)
>
> Otherwise, there are no new regression diffs with my patch.
>
If you have a test case or more details about that, could you begin another
thread? It is not related to this patch review.
Btw, I cannot reproduce that neither on buildfarm nor in my environments.


> Thanks
> -Amit
>
> On 16 August 2012 15:24, Michael Paquier <mic...@gm...>
> wrote:
> >
> > Hi,
> >
> > I am just having a quick look at this patch.
> > And here are my comments so far.
> >
> > 1) pgxc_rm_tabspcpath a too complicated name? Something like
> > pgxc_remove_tablespace_path is longer but at least explicit. Other ideas
> are
> > welcome.
> > For example there are in postgres functions named like
> > pg_stat_get_backend_activity_start with long but explicit names.
> > If you are going to create several functions like this one, we should
> have
> > a similar naming policy.
> > 2) In pgxc_rm_tabspcpath, you should add at least a permission on the
> > tablespace.
> > 3) You should rename get_default_tablespace to get_db_default_tablespace,
> > as we get the tablespace for a given database.
>
> As mentioned above, now these functions are redundant because we don't
> have to explicitly call cleanup functions.
>
> > 4 ) I am not sure that alterdb_tbsp_name should be in dbcommands.c as it
> > is only called from utility.c. Why not creating a static function for
> that
> > in utility.c?
>
> IMO, this is a AlterDB statement code, it should be in dbcommands.c .
>
I'm OK with that.


>
> > Or are you planning to extend that in a close future?
> > In order to reduce the footprint of this code in AlterDatabaseStmt, you
> > could also create a separate function dedicated to this treatment and
> > incorporate alterdb_tbsp_name inside it.
>
> Now, anyway, the new code in utility.c is very few lines.
>
> > 5) We should be very careful with the design of the APIs
> get_success_nodes
> > and pgxc_all_success_nodes as this could play an important role in the
> > future error handling refactoring.
>
> For now, I have kept these functions as-is. We might change them in
> the forthcoming error handling work.
>
> > I don't have any idea now, but I am sure
> > I will have some ideas tomorrow morning about that.
> >
> > That's all for the time being, I will come back to this patch tomorrow
> > however for more comments.
> >
> > On Thu, Aug 16, 2012 at 2:02 PM, Amit Khandekar
> > <ami...@en...> wrote:
> >>
> >> PFA patch for the support for running :
> >> ALTER DATABASE SET TABLESPACE ...
> >> in a transaction-safe manner.
> >>
> >> If one of the nodes returns error, the database won't be affected on any
> >> of the nodes because now the statement runs in a transaction block on
> remote
> >> nodes.
> >>
> >> The two tasks the stmt executes are :
> >> 1. Copy tablespace files into the new tablespace path, and commit
> >> 2. Remove original tablespace path, record WAL log for this, and commit.
> >>
> >> These 2 tasks are now invoked separately from the coordinator. It moves
> >> over to the task 2 only after it completes task 1 on all the nodes.
> >>
> >> Task 1: If task 1 fails, the newly created tablespace directory
> structure
> >> gets cleaned up by propogating a new function call pgxc_rm_tabspcpath()
> from
> >> coordinator onto the successful nodes. The failed nodes automatically do
> >> this cleanup due to the existing PG_ENSURE callback mechanism in this
> code.
> >>
> >> This is what the user gets when the statement fails during the first
> >> commit (this case, the target directory had some files on data_node_1) :
> >>
> >> postgres=# alter database db1 set tablespace tsp2;
> >> ERROR:  some relations of database "db1" are already in tablespace
> "tsp2"
> >> CONTEXT:  Error message received from nodes: data_node_1
> >> postgres=#
> >>
> >> I tried to see if we can avoid explicitly calling the cleanup function
> >> and instead use some rollback callback mechanism which will
> automatically do
> >> the above cleanup during AbortTransaction() on each nodes, but I am not
> sure
> >> we can do so. There is the function RegisterXactCallback() to do this
> for
> >> dynamically loaded modules, but not sure of the consequences if we do
> the
> >> cleanup using this.
> >>
> >>
> >> Task 2: The task 2 is nothing but removal of old tablespace directories.
> >> By any chance, if the directory can't be cleaned up, the PG code
> returns a
> >> warning, not an error. But in XC, we don't yet seem to have the support
> for
> >> returning warnings from remote node. So currently, if the old tablespace
> >> directories can't be cleaned up, we are silently returning, but with the
> >> database consistently set it's new tablespace on all nodes.
> >>
> >> I think such issues of getting user-friendly error messages in general
> >> will be tackled correctly in the next error-handling project.
> >>
> >>
> >> The patch is not yet ready to checkin, though it has working
> >> functionality. I want to make the function ExecUtilityWithCleanup()
> >> re-usable for the other commands. Currently it can be used only for
> ALTER
> >> DATABASE SET TABLESPACE. With some minor changes, it can be made a base
> >> function for other commands.
> >>
> >> Once I send the final patch, we can review it, but anyone feel free to
> >> send comments anytime.
>
> On 22 August 2012 10:57, Amit Khandekar <ami...@en...>
> wrote:
> > PFA patch to support running :
> > ALTER DATABASE SET TABLESPACE
> > CREATE DATABASE
> > CREATE TABLESPACE
> > in a transaction-safe manner.
> >
> > Since these statements don't run inside a transaction block, an error in
> one
> > of the nodes leaves the cluster in an inconsistent state, and the user is
> > not able to re-run the statement.
> >
> > With the patch, if one of the nodes returns error, the database won't be
> > affected on any of the nodes because now the statement runs in a
> transaction
> > block on remote nodes.
> >
> > When one node fails, we need to cleanup the files created on successful
> > nodes. Due to this, for each of the above statements, we now register a
> > callback function to be called during AbortTransaction(). I have
> hardwired a
> > new function AtEOXact_DBCleanup() to be called in AbortTransaction().
> This
> > callback mechanism will automatically do the above cleanup during
> > AbortTransaction() on each nodes. There is this function
> > RegisterXactCallback() to do this for dynamically loaded modules, but it
> > makes sense to instead add a separate new function, because the DB
> cleanup
> > is in-built backend code.
> >
> >
> > ----------
> > ALTER DATABASE SET TABLESPACE
> >
> > For ALTER DATABASE SET TABLESPACE, the stmt executes two tasks as two
> > separate commits :
> > 1. Copy tablespace files into the new tablespace path, and commit
> > 2. Remove original tablespace path, record WAL log for this, and commit.
> >
> > These 2 tasks are now invoked separately from the coordinator. It moves
> over
> > to the task 2 only after it completes task 1 on all the nodes.
> >
> > This is what the user now gets when the statement fails during the first
> > commit (this case, the target directory had some files on data_node_1) :
> >
> > postgres=# alter database db1 set tablespace tsp2;
> > ERROR:  some relations of database "db1" are already in tablespace "tsp2"
> > CONTEXT:  Error message received from nodes: data_node_1
> > postgres=#
> >
> >
> >
> > Task 2: The task 2 is nothing but removal of old tablespace directories.
> By
> > any chance, if the directory can't be cleaned up, the PG code returns a
> > warning, not an error. But in XC, we don't yet seem to have the support
> for
> > returning warnings from remote node. So currently, if the old tablespace
> > directories can't be cleaned up, we are silently returning, but with the
> > database consistently set it's new tablespace on all nodes.
> >
> >
> > ----------
> >
> > This patch is not yet ready for checkin. It needs more testing, and a new
> > regression test. But let me know if anybody identifies any issues,
> > especially the rollback callback mechanism that is used to cleanup the
> files
> > on transaction abort.
> >
> > Yet to support other statements like DROP TABLESPACE, DROP DATABASE.
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Postgres-xc-developers mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>
>


-- 
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-developers] TRIGGER implementation effort: full parameter support

From: Michael P. <mic...@gm...> - 2012-08-27 13:37:01

On Mon, Aug 27, 2012 at 9:23 PM, Ashutosh Bapat <
ash...@en...> wrote:

>
>
> On Fri, Aug 24, 2012 at 9:35 AM, Michael Paquier <
> mic...@gm...> wrote:
>
>> Hi all,
>>
>> I am currently looking at the implementation of triggers and designed
>> some simple test sets that I am planning to extend for multiple table types.
>> Those tests contains I think the most commonly used triggers used in PG
>> based on the events (INSERT, UPDATE, DELETE, nothing for TRUNCATE yet) and
>> the firing moments (BEFORE, AFTER, INSTEAD OF).
>> You can find them attached, they run correctly on vanilla.
>>
>> Btw, I already designed and implemented some code allowing to evaluate if
>> a trigger can be safely shipped to Datanodes, and also some code to create,
>> drop and rename triggers safely in cluster.
>> My code is published here (don't mind it being public):
>> https://github.com/michaelpq/pgxc/tree/trigger.
>> Those are the basics I think, and I already coded them.
>>
>> For the time being, a trigger is defined a shippable to remote nodes if
>> it satisfies all those conditions:
>> - the query invocating it is entirely shippable, former FQS code, no
>> nothing to do
>> - the procedure launched is shippable, code already implemented
>> - the trigger is of type BEFORE/AFTER (INSTEAD OF can only be defined on
>> views, and views are only available on Coordinators), code already
>> implemented
>> - event is of type INSERT/DELETE/UPDATE. Events on TRUNCATE need to be
>> launched at Coordinator to keep global control. (Events triggers in 9.3
>> will be the same), code already implemented
>>
>> Then, in order to fundamentaly being able to support all the types of
>> triggers, 2 things are needed:
>> 1) Better implementation support for parameters, particularly cached
>> plans and functions
>> Now when a trigger is launched it fails with the following error:
>> postgres=# INSERT INTO my_table VALUES (1,2);
>> ERROR:  cache lookup failed for type 0
>> CONTEXT:  SQL statement "UPDATE table_stats SET num_insert_row =
>> num_insert_row + 1  WHERE table_name = TG_TABLE_NAME"
>> PL/pgSQL function count_insert_row() line 3 at SQL statement
>> This means that the parameters used inside the executed query are not set
>> properly in the cached plan used by the plpgsql function invocated by
>> trigger
>>
>
> Where are parameters involved here? Please provide full testcase. The
> trigger definitions are missing.
>
The test cases are attached.
You also need to fetch the code from this branch:
https://github.com/michaelpq/pgxc/tree/trigger

You will see why fixing the parameters first is really important...


>
>> 2) Implement an alternative to GetTupleForTrigger in trigger.c which is
>> cluster-wide. GetTupleForTrigger is used by AFTER ROW triggers to get a
>> tuple to be modified from disk. This API is used to get a local tuple, but
>> in the case of XC we need something allowing to fetch a tuple from a remote
>> node. I am thinking about using some of the APIs in remotecopy.c to get
>> easily this information from remote nodes.
>>
>> Point 1) is really important as it currently impacts several tests cases
>> (plpgsql, plancache, rangefuncs: issue 3553038 in tracker), so I will work
>> on that first and submit a patch only for that.
>> Once 1) is done, 2) will include a more complete trigger implementation
>> with the FQS code I already wrote and the alternative to GetTupleForTrigger
>> (called GetRemoteTupleForTrigger?).
>>
>> As far as I could see, those are the things I need to do to have a fully
>> implementation for triggers.
>> Please note that I might have forgotten things that I did not notice, in
>> consequence I will update my analysis once 1) is done.
>>
>> Regards.
>> --
>> Michael Paquier
>> http://michael.otacoo.com
>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Postgres-xc-developers mailing list
>> Pos...@li...
>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>>
>>
>
>
> --
> Best Wishes,
> Ashutosh Bapat
> EntepriseDB Corporation
> The Enterprise Postgres Company
>
>


-- 
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-developers] TRIGGER implementation effort: full parameter support

From: Ashutosh B. <ash...@en...> - 2012-08-27 12:23:19

On Fri, Aug 24, 2012 at 9:35 AM, Michael Paquier
<mic...@gm...>wrote:

> Hi all,
>
> I am currently looking at the implementation of triggers and designed some
> simple test sets that I am planning to extend for multiple table types.
> Those tests contains I think the most commonly used triggers used in PG
> based on the events (INSERT, UPDATE, DELETE, nothing for TRUNCATE yet) and
> the firing moments (BEFORE, AFTER, INSTEAD OF).
> You can find them attached, they run correctly on vanilla.
>
> Btw, I already designed and implemented some code allowing to evaluate if
> a trigger can be safely shipped to Datanodes, and also some code to create,
> drop and rename triggers safely in cluster.
> My code is published here (don't mind it being public):
> https://github.com/michaelpq/pgxc/tree/trigger.
> Those are the basics I think, and I already coded them.
>
> For the time being, a trigger is defined a shippable to remote nodes if it
> satisfies all those conditions:
> - the query invocating it is entirely shippable, former FQS code, no
> nothing to do
> - the procedure launched is shippable, code already implemented
> - the trigger is of type BEFORE/AFTER (INSTEAD OF can only be defined on
> views, and views are only available on Coordinators), code already
> implemented
> - event is of type INSERT/DELETE/UPDATE. Events on TRUNCATE need to be
> launched at Coordinator to keep global control. (Events triggers in 9.3
> will be the same), code already implemented
>
> Then, in order to fundamentaly being able to support all the types of
> triggers, 2 things are needed:
> 1) Better implementation support for parameters, particularly cached plans
> and functions
> Now when a trigger is launched it fails with the following error:
> postgres=# INSERT INTO my_table VALUES (1,2);
> ERROR:  cache lookup failed for type 0
> CONTEXT:  SQL statement "UPDATE table_stats SET num_insert_row =
> num_insert_row + 1  WHERE table_name = TG_TABLE_NAME"
> PL/pgSQL function count_insert_row() line 3 at SQL statement
> This means that the parameters used inside the executed query are not set
> properly in the cached plan used by the plpgsql function invocated by
> trigger
>

Where are parameters involved here? Please provide full testcase. The
trigger definitions are missing.


> 2) Implement an alternative to GetTupleForTrigger in trigger.c which is
> cluster-wide. GetTupleForTrigger is used by AFTER ROW triggers to get a
> tuple to be modified from disk. This API is used to get a local tuple, but
> in the case of XC we need something allowing to fetch a tuple from a remote
> node. I am thinking about using some of the APIs in remotecopy.c to get
> easily this information from remote nodes.
>
> Point 1) is really important as it currently impacts several tests cases
> (plpgsql, plancache, rangefuncs: issue 3553038 in tracker), so I will work
> on that first and submit a patch only for that.
> Once 1) is done, 2) will include a more complete trigger implementation
> with the FQS code I already wrote and the alternative to GetTupleForTrigger
> (called GetRemoteTupleForTrigger?).
>
> As far as I could see, those are the things I need to do to have a fully
> implementation for triggers.
> Please note that I might have forgotten things that I did not notice, in
> consequence I will update my analysis once 1) is done.
>
> Regards.
> --
> Michael Paquier
> http://michael.otacoo.com
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Postgres-xc-developers mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>
>


-- 
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Enterprise Postgres Company

Re: [Postgres-xc-developers] Patch fixing parameter handling for plancache and spi

From: Michael P. <mic...@gm...> - 2012-08-27 12:17:42

On Mon, Aug 27, 2012 at 6:48 PM, Ashutosh Bapat <
ash...@en...> wrote:

>
>
> On Mon, Aug 27, 2012 at 2:53 PM, Michael Paquier <
> mic...@gm...> wrote:
>
>> Hi all,
>>
>> Please find attached a patch that fixes many parameter issues we saw
>> until now with plpgsql functions and prepared plans.
>> It is now possible to run plpgsql functions with DML and SELECT queries
>> without problems, at least regressions run fine and things like the SQL
>> attached run now finely.
>>
>
> We need a better way to handle parameters. This patch only adds on top of
> the existing infrastructure, which is buggy. So, when we will come to
> refactoring it, we will need to take care of your changes. Instead, we
> should really look at holistic solution for hanlding parameters which can
> handle all kinds of parameters gracefully.
>
If I got a look at the order of priority of things to be done, this
infrastructure looks good for the time being.
I want btw the come back to the refactoring of executor code later, but
after doing the trigger implementation.
Also, refactoring all the code in executor and planner related to parameter
handling will require an effort that cannot be done for the time being,
which I would estimate to 2 months.
You need to believe me here.


>
>>
>> So, it fixes 2 several issues in regression tests, and includes a fix for
>> plpgsql and rangefuncs tests.
>> plancache is also showing less errors.
>>
>
> I see a lot of unrelated diffs. Can you please segregate the patches. For
> example there is on diff related to round robin vs roundrobin.
>
This was from the old plpgsql output which was not possible to fix before
this patch.
Patch fix implies: correct output.

>
>
>>
>> This is a prerequisite for triggers, so if there are no comments, I will
>> go ahead and commit on tomorrow morning JP time.
>> And move on to the real trigger implementation.
>>
>
> Can you please elaborate the reasons? Why do you need this before
> implementing the triggers?
>
I sent an email about that yesterday.
Triggers fire functions. Triggers fire plpgsql functions particularly. If
plpgsql functions do not work, what is the point?


> I thought these both things are unrelated.
>
i think on the contrary that both are related.
What is the point of having triggers if you  are not able to fire
procedures correctly?

>
> I am not in favour of committing a half cooked solution again for
> parameter handling.
>
Regarding the time I got here that looks to be a good deal, and fixes 2
regressions.
-- 
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-developers] Patch fixing parameter handling for plancache and spi

From: Ashutosh B. <ash...@en...> - 2012-08-27 09:48:13

On Mon, Aug 27, 2012 at 2:53 PM, Michael Paquier
<mic...@gm...>wrote:

> Hi all,
>
> Please find attached a patch that fixes many parameter issues we saw until
> now with plpgsql functions and prepared plans.
> It is now possible to run plpgsql functions with DML and SELECT queries
> without problems, at least regressions run fine and things like the SQL
> attached run now finely.
>

We need a better way to handle parameters. This patch only adds on top of
the existing infrastructure, which is buggy. So, when we will come to
refactoring it, we will need to take care of your changes. Instead, we
should really look at holistic solution for hanlding parameters which can
handle all kinds of parameters gracefully.

>
> So, it fixes 2 several issues in regression tests, and includes a fix for
> plpgsql and rangefuncs tests.
> plancache is also showing less errors.
>

I see a lot of unrelated diffs. Can you please segregate the patches. For
example there is on diff related to round robin vs roundrobin. It should
have gone with the respective patch. Why didn't we take care of it, while
committing that patch?

>
> This is a prerequisite for triggers, so if there are no comments, I will
> go ahead and commit on tomorrow morning JP time.
> And move on to the real trigger implementation.
>

Can you please elaborate the reasons? Why do you need this before
implementing the triggers? I thought these both things are unrelated.

I am not in favour of committing a half cooked solution again for parameter
handling.

> Regards,
> --
> Michael Paquier
> http://michael.otacoo.com
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Postgres-xc-developers mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>
>

-- 
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Enterprise Postgres Company

Re: [Postgres-xc-developers] Patch to handle stmts that cannot be run in transaction block

From: Amit K. <ami...@en...> - 2012-08-27 05:58:22

Attachments: bug_3561969_stmts_outside_ts_block_.patch

In the earlier patch I had used xact abort callback functions to do
the cleanup. Now in the new patch (attached)  even the *commit*
calback function is used.

So, in case of alter-database-set-tablespace, after the operation is
successful in all nodes, the CommitTransaction() invokes the
AtEOXact_DBCleanup() function (among other such functions). This
ultimately causes the new function movedb_success_callback() to be
called. This in turn does the original tablespace directory cleanup.

This way, we don't have to explicitly send an on-success-cleanup
function call from coordinator. It will happen on each individual node
as a on-commit callback routine. So in effect, there is no need of the
pg_rm_tablespacepath() function that I had defined in earlier patch. I
have removed that code in this new patch.

I am done with these changes now. This patch is for formal review. Bug
id: 3561969.

Statements supported through this patch are:

CREATE DATABASE
CREATE TABLESPACE
ALTER DATABASE SET TABLESPACE

Some more comments to Michael's comments are embedded inline below ...

Regression
--------------

Unfortunately I could not come up with an automated regression test.
The way this needs to be tested requires some method to abort the
statement on *particular* node, not all nodes. I do this manually by
creating some files in the new tablespace path of a node, so that the
create-tablespace or alter-database errors out on that particular node
due to presence of pre-existing files. We cannot dynamically determine
this patch because it is made up of oids. So this I didn't manage to
automate as part of regression test. If anyone has ideas, that is
welcome.

Recently something seems to have changed in my system after I
reinstalled Ubuntu: the prepared_xact test has again started hanging
in DROP TABLE. Also, xc_for_update is showing "node not defined"
errors:
  COMMIT PREPARED 'tbl_mytab1_locked';
+ ERROR:  PGXC Node COORD_1: object not defined

All of this happens without my patch applied. Has anyone seen this
lately? (If required, we will discuss this in another thread subject,
not this mail thread)

Otherwise, there are no new regression diffs with my patch.

Thanks
-Amit

On 16 August 2012 15:24, Michael Paquier <mic...@gm...> wrote:
>
> Hi,
>
> I am just having a quick look at this patch.
> And here are my comments so far.
>
> 1) pgxc_rm_tabspcpath a too complicated name? Something like
> pgxc_remove_tablespace_path is longer but at least explicit. Other ideas are
> welcome.
> For example there are in postgres functions named like
> pg_stat_get_backend_activity_start with long but explicit names.
> If you are going to create several functions like this one, we should have
> a similar naming policy.
> 2) In pgxc_rm_tabspcpath, you should add at least a permission on the
> tablespace.
> 3) You should rename get_default_tablespace to get_db_default_tablespace,
> as we get the tablespace for a given database.

As mentioned above, now these functions are redundant because we don't
have to explicitly call cleanup functions.

> 4 ) I am not sure that alterdb_tbsp_name should be in dbcommands.c as it
> is only called from utility.c. Why not creating a static function for that
> in utility.c?

IMO, this is a AlterDB statement code, it should be in dbcommands.c .

> Or are you planning to extend that in a close future?
> In order to reduce the footprint of this code in AlterDatabaseStmt, you
> could also create a separate function dedicated to this treatment and
> incorporate alterdb_tbsp_name inside it.

Now, anyway, the new code in utility.c is very few lines.

> 5) We should be very careful with the design of the APIs get_success_nodes
> and pgxc_all_success_nodes as this could play an important role in the
> future error handling refactoring.

For now, I have kept these functions as-is. We might change them in
the forthcoming error handling work.

> I don't have any idea now, but I am sure
> I will have some ideas tomorrow morning about that.
>
> That's all for the time being, I will come back to this patch tomorrow
> however for more comments.
>
> On Thu, Aug 16, 2012 at 2:02 PM, Amit Khandekar
> <ami...@en...> wrote:
>>
>> PFA patch for the support for running :
>> ALTER DATABASE SET TABLESPACE ...
>> in a transaction-safe manner.
>>
>> If one of the nodes returns error, the database won't be affected on any
>> of the nodes because now the statement runs in a transaction block on remote
>> nodes.
>>
>> The two tasks the stmt executes are :
>> 1. Copy tablespace files into the new tablespace path, and commit
>> 2. Remove original tablespace path, record WAL log for this, and commit.
>>
>> These 2 tasks are now invoked separately from the coordinator. It moves
>> over to the task 2 only after it completes task 1 on all the nodes.
>>
>> Task 1: If task 1 fails, the newly created tablespace directory structure
>> gets cleaned up by propogating a new function call pgxc_rm_tabspcpath() from
>> coordinator onto the successful nodes. The failed nodes automatically do
>> this cleanup due to the existing PG_ENSURE callback mechanism in this code.
>>
>> This is what the user gets when the statement fails during the first
>> commit (this case, the target directory had some files on data_node_1) :
>>
>> postgres=# alter database db1 set tablespace tsp2;
>> ERROR:  some relations of database "db1" are already in tablespace "tsp2"
>> CONTEXT:  Error message received from nodes: data_node_1
>> postgres=#
>>
>> I tried to see if we can avoid explicitly calling the cleanup function
>> and instead use some rollback callback mechanism which will automatically do
>> the above cleanup during AbortTransaction() on each nodes, but I am not sure
>> we can do so. There is the function RegisterXactCallback() to do this for
>> dynamically loaded modules, but not sure of the consequences if we do the
>> cleanup using this.
>>
>>
>> Task 2: The task 2 is nothing but removal of old tablespace directories.
>> By any chance, if the directory can't be cleaned up, the PG code returns a
>> warning, not an error. But in XC, we don't yet seem to have the support for
>> returning warnings from remote node. So currently, if the old tablespace
>> directories can't be cleaned up, we are silently returning, but with the
>> database consistently set it's new tablespace on all nodes.
>>
>> I think such issues of getting user-friendly error messages in general
>> will be tackled correctly in the next error-handling project.
>>
>>
>> The patch is not yet ready to checkin, though it has working
>> functionality. I want to make the function ExecUtilityWithCleanup()
>> re-usable for the other commands. Currently it can be used only for ALTER
>> DATABASE SET TABLESPACE. With some minor changes, it can be made a base
>> function for other commands.
>>
>> Once I send the final patch, we can review it, but anyone feel free to
>> send comments anytime.

On 22 August 2012 10:57, Amit Khandekar <ami...@en...> wrote:
> PFA patch to support running :
> ALTER DATABASE SET TABLESPACE
> CREATE DATABASE
> CREATE TABLESPACE
> in a transaction-safe manner.
>
> Since these statements don't run inside a transaction block, an error in one
> of the nodes leaves the cluster in an inconsistent state, and the user is
> not able to re-run the statement.
>
> With the patch, if one of the nodes returns error, the database won't be
> affected on any of the nodes because now the statement runs in a transaction
> block on remote nodes.
>
> When one node fails, we need to cleanup the files created on successful
> nodes. Due to this, for each of the above statements, we now register a
> callback function to be called during AbortTransaction(). I have hardwired a
> new function AtEOXact_DBCleanup() to be called in AbortTransaction(). This
> callback mechanism will automatically do the above cleanup during
> AbortTransaction() on each nodes. There is this function
> RegisterXactCallback() to do this for dynamically loaded modules, but it
> makes sense to instead add a separate new function, because the DB cleanup
> is in-built backend code.
>
>
> ----------
> ALTER DATABASE SET TABLESPACE
>
> For ALTER DATABASE SET TABLESPACE, the stmt executes two tasks as two
> separate commits :
> 1. Copy tablespace files into the new tablespace path, and commit
> 2. Remove original tablespace path, record WAL log for this, and commit.
>
> These 2 tasks are now invoked separately from the coordinator. It moves over
> to the task 2 only after it completes task 1 on all the nodes.
>
> This is what the user now gets when the statement fails during the first
> commit (this case, the target directory had some files on data_node_1) :
>
> postgres=# alter database db1 set tablespace tsp2;
> ERROR:  some relations of database "db1" are already in tablespace "tsp2"
> CONTEXT:  Error message received from nodes: data_node_1
> postgres=#
>
>
>
> Task 2: The task 2 is nothing but removal of old tablespace directories. By
> any chance, if the directory can't be cleaned up, the PG code returns a
> warning, not an error. But in XC, we don't yet seem to have the support for
> returning warnings from remote node. So currently, if the old tablespace
> directories can't be cleaned up, we are silently returning, but with the
> database consistently set it's new tablespace on all nodes.
>
>
> ----------
>
> This patch is not yet ready for checkin. It needs more testing, and a new
> regression test. But let me know if anybody identifies any issues,
> especially the rollback callback mechanism that is used to cleanup the files
> on transaction abort.
>
> Yet to support other statements like DROP TABLESPACE, DROP DATABASE.

Re: [Postgres-xc-developers] Sequences and issues with pg_dump!

From: Nikhil S. <ni...@st...> - 2012-08-24 07:07:06

> 1) Get the current value of sequence by using currval inside pg_dump and not
> the cached info on each local node. This way you just need to modify pg_dump
> 2) Put a filter when the current value of a RELKIND_SEQUENCE is requested on
> a node to retrieve its current value from GTM instead of the local database.
>
> I am not honestly a fan of solution 2, the cached information on each node
> being different because of the architecture nature of XC, so we can live
> with that I think. And I don't think we should put too deep our hands in the
> fundamentals of such relation handling just for this need.
> The fix inside pg_dump looks better to my mind. And we already modified
> pg_dump to incorporate distribution information.
> Comments?

Hmmm, ideally we should treat sequences like the global objects that
they are. So similar to the planner having the knowledge of which all
datanodes to contact for a specific table, we should also incorporate
logic to contact the GTM whenever a sequence is being queried
internally. This will always provide a consistent view of the sequence
from any coordinator.

But I guess this will be a pretty invasive change..

Regards,
Nikhils

-- 
StormDB - http://www.stormdb.com
The Database Cloud
Postgres-XC Support and Service

Re: [Postgres-xc-developers] Sequences and issues with pg_dump!

From: Michael P. <mic...@gm...> - 2012-08-24 00:20:23

On Thu, Aug 23, 2012 at 11:07 PM, Nikhil Sontakke <ni...@st...>wrote:

> Hi,
>
> We were bitten by this issue recently.
>
> Consider a small test with two coordinators c1 and c2:
>
> On c1:
>
> postgres=# CREATE SEQUENCE serial;
> postgres=# select nextval('serial');
> postgres=# select nextval('serial');
> postgres=# select nextval('serial');
> postgres=# select nextval('serial');
>
> Now go to c2 and do a pg_dump of this database. Going to c2 is IMPORTANT.
>
> In the dump you will see the following:
>
> SELECT pg_catalog.setval('serial', 1, false);
>
> If you were using this to backup your db, your sequence info has gone
> for a toss! :(
> Pretty bad behavior by pg_dump here.
>
> The issue is because pg_dump is directly quering local info to get the
> lastvalue. It should be querying the GTM to get this info. I started
> looking into this a bit but then thought better to discuss it here.
> What's the best way to ask the GTM for the latest lastval of all
> sequences while doing a dump?
>
Yes, because sequence is defined as a relation on each node, and I think
that pg_dump requests this relation data and not a simple "select
currval();". It is more simple to get all the information of sequence in a
single shot.

Yes, this information is inconsistent among Coordinators for the current
value because of the way sequences are defined on local nodes.
I see 2 ways to fix that:
1) Get the current value of sequence by using currval inside pg_dump and
not the cached info on each local node. This way you just need to modify
pg_dump
2) Put a filter when the current value of a RELKIND_SEQUENCE is requested
on a node to retrieve its current value from GTM instead of the local
database.

I am not honestly a fan of solution 2, the cached information on each node
being different because of the architecture nature of XC, so we can live
with that I think. And I don't think we should put too deep our hands in
the fundamentals of such relation handling just for this need.
The fix inside pg_dump looks better to my mind. And we already modified
pg_dump to incorporate distribution information.
Comments?
-- 
Michael Paquier
http://michael.otacoo.com

[Postgres-xc-developers] Sequences and issues with pg_dump!

From: Nikhil S. <ni...@st...> - 2012-08-23 14:29:47

Hi,

We were bitten by this issue recently.

Consider a small test with two coordinators c1 and c2:

On c1:

postgres=# CREATE SEQUENCE serial;
postgres=# select nextval('serial');
postgres=# select nextval('serial');
postgres=# select nextval('serial');
postgres=# select nextval('serial');

Now go to c2 and do a pg_dump of this database. Going to c2 is IMPORTANT.

In the dump you will see the following:

SELECT pg_catalog.setval('serial', 1, false);

If you were using this to backup your db, your sequence info has gone
for a toss! :(
Pretty bad behavior by pg_dump here.

The issue is because pg_dump is directly quering local info to get the
lastvalue. It should be querying the GTM to get this info. I started
looking into this a bit but then thought better to discuss it here.
What's the best way to ask the GTM for the latest lastval of all
sequences while doing a dump?

Regards,
Nikhils
-- 
StormDB - http://www.stormdb.com
The Database Cloud
Postgres-XC Support and Service