Quick Links

Re: Support logical replication of DDLs

Lists:	pgsql-generalpgsql-hackers

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Support logical replication of DDLs
Date:	2022-02-21 15:53:43
Message-ID:	CAAD30U+pVmfKwUKy8cbZOnUXyguJ-uBNejwD75Kyo=OjdQGJ9g@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Hello,

One of the most frequently requested improvements from our customers
is to reduce downtime associated with software updates (both major and
minor versions). To do this, we have reviewed potential contributions to
improving logical replication.

I’m working on a patch to support logical replication of data
definition language statements (DDLs). This is a useful feature when a
database in logical replication has lots of tables, functions and
other objects that change over time, such as in online cross major
version upgrade.

I put together a prototype that replicates DDLs using the generic
messages for logical decoding. The idea is to log the candidate DDL
string in ProcessUtilitySlow() using LogLogicalMessge() with a new
flag in WAL record type xl_logical_message indicating it’s a DDL
message. The xl_logical_message record is decoded and sent to the
subscriber via pgoutput. The logical replication worker process is
dispatched for this new DDL message type and executes the command
accordingly.

However, there are still many edge cases to sort out because not every
DDL statement can/should be replicated. Some of these include:

1. DDL involving multiple tables where only some tables are replicated, e.g.

DROP TABLE replicated_foo, notreplicated_bar;

This statement will fail on the subscriber and block logical
replication. It can be detected and filtered on the publisher.

2. Any DDL that calls a volatile function, such as NOW() or RAND(), is
likely to generate a different value on each replica. It is possible
to work around these issues—for example, the publisher can replace any
volatile function calls with a fixed return value when the statement
is logged so that the subscribers all get the same value. We will have
to consider some other cases.

3. CREATE TABLE AS and SELECT INTO, For example:

CREATE TABLE foo AS
SELECT field_1, field_2 FROM bar;

There are a few issues that can occur here. For one, it’s possible
that table bar doesn't exist on the subscriber. Even if “bar” does
exist, it may not be fully up-to-date with the publisher, which would
cause a data mismatch on “foo” between the publisher and subscriber.

4. Statements that have nondeterministic side effects (e.g., as caused
by triggers, stored procedures, user-defined functions) may result in
different side effects occurring on each subscriber.

Whether a DDL should be replicated also depends on what granularity do
we define DDL replication. For example, we can define DDL replication
on these levels:

1. Database level
Allows all DDLs for a database to be replicated except for certain
edge cases (refer to the edge cases mentioned above).
This is likely a major use case, such as in online major version upgrade.

2. Table level
Allows DDLs on the published tables to be replicated except for
certain edge cases.
This is useful for consolidating multiple databases into a single one,
e.g. for analytics.

3. Other fine-grained levels base on the object type such as index,
function, procedure and view etc.
Allows DDLs on certain object types to be replicated. At the moment
I’m unsure of a use case for this.

To implement such DDL replication levels, we need to modify the CREATE
PUBLICATION syntax. For example, to help starting the discussion on
the granularity of DDL replication, we can add a new option list ‘ddl’
in the WITH definition:

CREATE PUBLICATION mypub FOR ALL TABLES with
(publish = ‘insert, update, delete, truncate’, ddl = ‘database’)

CREATE PUBLICATION mypub FOR TABLE T1, T2 with
(publish = ‘insert, update, delete, truncate’, ddl = ‘table’)

CREATE PUBLICATION mypub FOR TABLE T1, T2 with
(publish = ‘insert, update, delete, truncate’, ddl = ‘table, function,
procedure’)

We can probably make “ddl = ‘database’” valid only for the FOR ALL
TABLES publication, because it doesn’t really make sense to replicate
all DDLs for a database when only a subset of tables are being
replicated (which can cause edge case 1 to occur frequently). “ddl =
‘database’” + FOR ALL TABLES is likely where logical replication of
DDL is most useful, i.e. for online major versions upgrades.

Based on the DDL publication levels we can further implement the logic
to conditionally log the DDL commands or to conditionally decode/ship
the logical DDL message.

Thoughts? Your feedback is appreciated.

Thanks,
Zheng Li
Amazon RDS/Aurora for PostgreSQL

From:	Aleksander Alekseev <aleksander(at)timescale(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-02-22 10:05:05
Message-ID:	CAJ7c6TMu-93kkiJ=OmqFP2wDER-aJM7jLk6TvzmzPQvSEzBiHw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Hi Zheng,

> I’m working on a patch to support logical replication of data
> definition language statements (DDLs).

That's great!

> However, there are still many edge cases to sort out because not every
> DDL statement can/should be replicated.

Maybe the first implementation shouldn't be perfect as long as known
limitations are documented and the future improvements are unlikely to
break anything for the users. Committing an MVP and iterating on this is
much simpler in terms of development and code review. Also, it will deliver
value to the users and give us feedback sooner.

> 1. DDL involving multiple tables where only some tables are replicated,
e.g.
>
> DROP TABLE replicated_foo, notreplicated_bar;
>

I would add DROP TABLE ... CASCADE to the list. Also, let's not forget that
PostgreSQL supports table inheritance and table partitioning.

> 2. Any DDL that calls a volatile function, such as NOW() or RAND(), is
> likely to generate a different value on each replica. It is possible
> to work around these issues—for example, the publisher can replace any
> volatile function calls with a fixed return value when the statement
> is logged so that the subscribers all get the same value. We will have
> to consider some other cases.

That would make sense.

> [...]
> Whether a DDL should be replicated also depends on what granularity do
> we define DDL replication. For example, we can define DDL replication
> on these levels:
>
> 1. Database level
> Allows all DDLs for a database to be replicated except for certain
> edge cases (refer to the edge cases mentioned above).
> This is likely a major use case, such as in online major version upgrade.

To my knowledge, this is not a primary use case for logical replication.
Also, I suspect that implementing it may be a bit challenging. What if we
focus on table-level replication for now?

--
Best regards,
Aleksander Alekseev

From:	Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-02-22 12:00:36
Message-ID:	CAB-JLwYkc6T4PX1OD3r9Zm-39OVpZ5+Ev4dvLYPkMT2EJEgyEA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Em seg., 21 de fev. de 2022 às 13:13, Zheng Li <zhengli10(at)gmail(dot)com>
escreveu:

>
> 2. Table level
> Allows DDLs on the published tables to be replicated except for
> certain edge cases.
>
> Think how to handle triggers and functions with same name but different
purpose.

Publisher
create function public.audit() returns trigger language plpgsql as $$
begin
new.Audit_User = current_user();
new.Audit_Date_Time = now();
return new;
end;$$
create trigger audit before insert or update on foo for each row execute
procedure public.audit();

Subscriber
create function public.audit() returns trigger language plpgsql as $$
begin
insert into Audit(Audit_Date_Time, Audit_User, Schema_Name, Table_Name,
Audit_Action, Field_Values) values(new.Audit_ts, new.Audit_User,
tg_table_schema, tg_table_name, tg_op, row_to_json(case when tg_op =
'DELETE' then old.* else new.* end));
return null;
end;$$
create trigger audit after insert or update or delete on foo for each row
execute procedure public.audit();

regards,
Marcos

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	Aleksander Alekseev <aleksander(at)timescale(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-02-24 03:10:32
Message-ID:	CAAD30UJPrvJunzq1H-xVf728-9PJUxUg7kmNEoHtHi=cNaofGw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Hi Aleksander,

>That's great!
Thanks!

>Maybe the first implementation shouldn't be perfect as long as known
>limitations are documented and the future improvements are unlikely to
>break anything for the users. Committing an MVP and iterating on this is
>much simpler in terms of development and code review. Also, it will deliver
>value to the users and give us feedback sooner.

Agree, I’ll post the prototype when it’s in a bit better shape. Also
I’d like to hear
some more feedback from the community on whether I’m heading the right
direction.

>I would add DROP TABLE ... CASCADE to the list. Also, let's not forget that
>PostgreSQL supports table inheritance and table partitioning.

Thanks, I will keep this in mind.

>> 1. Database level
>> Allows all DDLs for a database to be replicated except for certain
>> edge cases (refer to the edge cases mentioned above).
>> This is likely a major use case, such as in online major version upgrade.
>To my knowledge, this is not a primary use case for logical replication.
>Also, I suspect that implementing it may be a bit challenging. What if we
>focus on table-level replication for now?

I think it is due to the fact that the current limitations in logical
replication are
holding it back in major version upgrade (MVU). Online / reduced downtime MVU
is a popular request from customers, and why we should enhance logical
replication
to support this use case.

Also I think table-level DDL replication is actually more challenging,
especially in
the FOR TABLE case, due to the fact that differences are expected to
occur between the
source and target database. Marcos’ comment also justifies the complexity
in this case. Whereas database-level DDL replication in the FOR ALL
TABLE case is
relatively simple because the source and target database are (almost) identical.

Regards,
Zheng

From:	Aleksander Alekseev <aleksander(at)timescale(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-02-24 08:56:36
Message-ID:	CAJ7c6TP2=_k-w0rDrqbn6A40qEQcZ=kwVf18+YsfowbFL0o3Pg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Hi Zheng,

> >Also, I suspect that implementing it may be a bit challenging. What if we
> >focus on table-level replication for now?
>
> I think it is due to the fact that the current limitations in logical
> replication are
> holding it back in major version upgrade (MVU). Online / reduced downtime MVU
> is a popular request from customers, and why we should enhance logical
> replication
> to support this use case.
>
> Also I think table-level DDL replication is actually more challenging,
> especially in
> the FOR TABLE case, due to the fact that differences are expected to
> occur between the
> source and target database. Marcos’ comment also justifies the complexity
> in this case. Whereas database-level DDL replication in the FOR ALL
> TABLE case is
> relatively simple because the source and target database are (almost) identical.

Sure, I don't insist on implementing table-level replication first.
It's up to you. My point was that it's not necessary to implement
everything at once.

--
Best regards,
Aleksander Alekseev

From:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-03-13 11:35:18
Message-ID:	CAFiTN-tM1zOk50xoeP5vdzaO0U_iQ3u8oZ2P1L7J=0ka+c7Utw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Mon, Feb 21, 2022 at 9:43 PM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
>
> Hello,
>
> One of the most frequently requested improvements from our customers
> is to reduce downtime associated with software updates (both major and
> minor versions). To do this, we have reviewed potential contributions to
> improving logical replication.
>
> I’m working on a patch to support logical replication of data
> definition language statements (DDLs). This is a useful feature when a
> database in logical replication has lots of tables, functions and
> other objects that change over time, such as in online cross major
> version upgrade.

> I put together a prototype that replicates DDLs using the generic
> messages for logical decoding. The idea is to log the candidate DDL
> string in ProcessUtilitySlow() using LogLogicalMessge() with a new
> flag in WAL record type xl_logical_message indicating it’s a DDL
> message. The xl_logical_message record is decoded and sent to the
> subscriber via pgoutput. The logical replication worker process is
> dispatched for this new DDL message type and executes the command
> accordingly.

If you don't mind, would you like to share the POC or the branch for this work?

> However, there are still many edge cases to sort out because not every
> DDL statement can/should be replicated. Some of these include:

> 3. CREATE TABLE AS and SELECT INTO, For example:
>
> CREATE TABLE foo AS
> SELECT field_1, field_2 FROM bar;
>
> There are a few issues that can occur here. For one, it’s possible
> that table bar doesn't exist on the subscriber. Even if “bar” does
> exist, it may not be fully up-to-date with the publisher, which would
> cause a data mismatch on “foo” between the publisher and subscriber.

In such cases why don't we just log the table creation WAL for DDL
instead of a complete statement which creates the table and inserts
the tuple? Because we are already WAL logging individual inserts and
once you make sure of replicating the table creation I think the exact
data insertion on the subscriber side will be taken care of by the
insert WALs no?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

From:	rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>
To:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-03-15 06:27:28
Message-ID:	CADgiWi5aDwV+o+P_c62pLBxsZ01BK-R8KbmuPbPOfrAwAw4ptA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Sun, Mar 13, 2022 at 5:05 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Mon, Feb 21, 2022 at 9:43 PM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
> >
> > Hello,
> >
> > One of the most frequently requested improvements from our customers
> > is to reduce downtime associated with software updates (both major and
> > minor versions). To do this, we have reviewed potential contributions to
> > improving logical replication.
> >
> > I’m working on a patch to support logical replication of data
> > definition language statements (DDLs). This is a useful feature when a
> > database in logical replication has lots of tables, functions and
> > other objects that change over time, such as in online cross major
> > version upgrade.
>
> +1
+1

> > I put together a prototype that replicates DDLs using the generic
> > messages for logical decoding. The idea is to log the candidate DDL
> > string in ProcessUtilitySlow() using LogLogicalMessge() with a new
> > flag in WAL record type xl_logical_message indicating it’s a DDL
> > message. The xl_logical_message record is decoded and sent to the
> > subscriber via pgoutput. The logical replication worker process is
> > dispatched for this new DDL message type and executes the command
> > accordingly.
>
> If you don't mind, would you like to share the POC or the branch for this work?
I would love to try this patch out, would you like to share branch or POC ?

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rajesh(dot)rs0541(at)gmail(dot)com
Subject:	Re: Support logical replication of DDLs
Date:	2022-03-16 21:17:18
Message-ID:	CAAD30ULtoGp8L_GKbV15Wnm+X5r=SE7MOnYHuqBr396m26jJSA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Hi,

>If you don't mind, would you like to share the POC or the branch for this work?

The POC patch is attached. It currently supports the following functionalities:
1. Configure either database level or table level DDL replication via
the CREATE PUBLICATION command.

2.Supports replication of DDL of the following types when database
level DDL replication is turned on. Other less common DDL types could
be added later.
T_CreateSchemaStmt
T_CreateStmt
T_CreateForeignTableStmt
T_AlterDomainStmt
T_DefineStmt
T_CompositeTypeStmt
T_CreateEnumStmt
T_CreateRangeStmt
T_AlterEnumStmt
T_ViewStmt
T_CreateFunctionStmt
T_AlterFunctionStmt
T_CreateTrigStmt
T_CreateDomainStmt
T_CreateCastStmt
T_CreateOpClassStmt
T_CreateOpFamilyStmt
T_AlterOpFamilyStmt
T_AlterOperatorStmt
T_AlterTypeStmt
T_GrantStmt
T_AlterCollationStmt
T_AlterTableStmt
T_IndexStmt

3.Supports replication of DDLs of the following types if only table
level DDL replication is turned on.
T_AlterTableStmt
T_IndexStmt

4.Supports seamless DML replication of new tables created via DDL
replication without having to manually running “ALTER SUBSCRIPTION ...
REFRESH PUBLICATION" on the subscriber.

Here is a demo:
source_db=# create publication mypub FOR ALL TABLES with (ddl = ‘database’);

target_db=# create subscription mysub CONNECTION 'dbname=source_db
host=localhost user=myuser port=5432' PUBLICATION mypub;

source_db=#
BEGIN;
CREATE TABLE foo (a int);
CREATE INDEX foo_idx ON foo (a);
ALTER TABLE foo ADD COLUMN b timestamptz;
CREATE FUNCTION foo_ts()
RETURNS trigger AS $$
BEGIN
NEW.b := current_timestamp;
RETURN NEW;
END;
$$
LANGUAGE plpgsql;

CREATE TRIGGER foo_ts BEFORE INSERT OR UPDATE ON foo
FOR EACH ROW EXECUTE FUNCTION foo_ts();
INSERT INTO foo VALUES (1);
COMMIT;

source_db=# select * from foo;
a | b
---+-------------------------------
1 | 2022-03-15 19:07:53.005215+00
(1 row)

target_db=# select * from foo;
a | b
---+-------------------------------
1 | 2022-03-15 19:07:53.005215+00
(1 row)

Here is the remaining work for this patch:

1. Test and handle corner case DDLs, such as the ones listed in my
first email[1].

2. Investigate less common DDL types (categorized in function
LogLogicalDDLCommand) and decide if and under what conditions they
should be replicated.

3. Determine if we want to include table-level DDL replication. If so,
we need to continue to enhance and test this feature.

I think this will be ready to consider for commit once 1 and 2 are
resolved. I think 3 can be done in the second phase.

If you are reviewing the code, I’ve provided a detailed description of
the patch below:
1. Syntax, pg_publication and pg_dump change:
Allows the user to configure either database level or table level DDL
replication via the CREATE PUBLICATION command as proposed in my first
email. Two new columns are added to the pg_publication catalog to show
the DDL replication levels, test output publication.out is updated
accordingly. pg_dump is also modified to accommodate the
pg_publication catalog change.

2. Logical logging change
a. A new WAL record type xl_logical_ddl_message is introduced to
support logical logging of DDL command. xl_logical_ddl_message is
similar to the existing xl_logical_message for generic message
logging. The reason for not using xl_logical_message directly as
proposed initially is I found out we need to log more information
(such as user role, search path and potentially more in the future)
than just one string, and we don’t want to make too much changes to
the existing xl_logical_message which may break its current consumers.
b. The logging of DDL command string is processed in function
LogLogicalDDLCommand. We categorize DDL command types into three
categories in this function:
1. replicated in database level replication only (such as CREATE
TABLE, CREATE FUNCTION).
2. replicated in database or table level replication depending on the
configuration (such as ALTER TABLE).
3. not supported for replication or pending investigation.

3. Logical decoding and Reorderbuffer change
Supports logical decoding of the new WAL record
xl_logical_ddl_message. This is similar to the logical decoding of
xl_logical_message. Tests for this change are added in the
test_decoding plugin.

4. Integration with pgoutput
Supports sending and receiving the DDL message using the logical
replication wire protocol. A new LogicalRepMsgType is introduced for
this purpose: LOGICAL_REP_MSG_DDLMESSAGE = 'L'.

5. Logical replication worker change
Supports execution of the DDL command in the original user role and
search_path. For any new table created this way, we also set its
srsubstate in the pg_subscription_rel catalog to SUBREL_STATE_INIT, So
that DML replication can progress on this new table without manually
running "ALTER SUBSCRIPTION ... REFRESH PUBLICATION".

6. TAP test
A new TAP test 030_rep_ddl.pl is added. We mainly focused on testing
the happy path of database level replication so far. Corner case DDLs
and table level DDL replication are still to be carefully tested.

>> However, there are still many edge cases to sort out because not every
>> DDL statement can/should be replicated. Some of these include:

>> 3. CREATE TABLE AS and SELECT INTO, For example:
>>
>> CREATE TABLE foo AS
>> SELECT field_1, field_2 FROM bar;
>>
>> There are a few issues that can occur here. For one, it’s possible
>> that table bar doesn't exist on the subscriber. Even if “bar” does
>> exist, it may not be fully up-to-date with the publisher, which would
>> cause a data mismatch on “foo” between the publisher and subscriber.

>In such cases why don't we just log the table creation WAL for DDL
>instead of a complete statement which creates the table and inserts
>the tuple? Because we are already WAL logging individual inserts and
>once you make sure of replicating the table creation I think the exact
>data insertion on the subscriber side will be taken care of by the
>insert WALs no?

The table creation WAL and table insert WAL are available. The tricky
part is how do we break down this command into two parts (a normal
CREATE TABLE followed by insertions) either from the parsetree or the
WALs. I’ll have to dig more on this.

I look forward to your feedback.

Regards,
Zheng

[1] https://www.postgresql.org/message-id/CAAD30U+pVmfKwUKy8cbZOnUXyguJ-uBNejwD75Kyo=OjdQGJ9g@mail.gmail.com

On Sun, Mar 13, 2022 at 7:35 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Mon, Feb 21, 2022 at 9:43 PM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
> >
> > Hello,
> >
> > One of the most frequently requested improvements from our customers
> > is to reduce downtime associated with software updates (both major and
> > minor versions). To do this, we have reviewed potential contributions to
> > improving logical replication.
> >
> > I’m working on a patch to support logical replication of data
> > definition language statements (DDLs). This is a useful feature when a
> > database in logical replication has lots of tables, functions and
> > other objects that change over time, such as in online cross major
> > version upgrade.
>
> +1
>
> > I put together a prototype that replicates DDLs using the generic
> > messages for logical decoding. The idea is to log the candidate DDL
> > string in ProcessUtilitySlow() using LogLogicalMessge() with a new
> > flag in WAL record type xl_logical_message indicating it’s a DDL
> > message. The xl_logical_message record is decoded and sent to the
> > subscriber via pgoutput. The logical replication worker process is
> > dispatched for this new DDL message type and executes the command
> > accordingly.
>
> If you don't mind, would you like to share the POC or the branch for this work?
>
> > However, there are still many edge cases to sort out because not every
> > DDL statement can/should be replicated. Some of these include:
>
> > 3. CREATE TABLE AS and SELECT INTO, For example:
> >
> > CREATE TABLE foo AS
> > SELECT field_1, field_2 FROM bar;
> >
> > There are a few issues that can occur here. For one, it’s possible
> > that table bar doesn't exist on the subscriber. Even if “bar” does
> > exist, it may not be fully up-to-date with the publisher, which would
> > cause a data mismatch on “foo” between the publisher and subscriber.
>
> In such cases why don't we just log the table creation WAL for DDL
> instead of a complete statement which creates the table and inserts
> the tuple? Because we are already WAL logging individual inserts and
> once you make sure of replicating the table creation I think the exact
> data insertion on the subscriber side will be taken care of by the
> insert WALs no?
>
>
> --
> Regards,
> Dilip Kumar
> EnterpriseDB: http://www.enterprisedb.com

Attachment	Content-Type	Size
ddl_replication.patch1	application/octet-stream	149.6 KB

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rajesh(dot)rs0541(at)gmail(dot)com
Subject:	Re: Support logical replication of DDLs
Date:	2022-03-16 22:06:18
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Hello

I think this is a pretty interesting and useful feature.

Did you see some old code I wrote towards this goal?
https://www.postgresql.org/message-id/[email protected]
The intention was that DDL would produce some JSON blob that accurately
describes the DDL that was run; the caller can acquire that and use it
to produce working DDL that doesn't depend on runtime conditions. There
was lots of discussion on doing things this way. It was ultimately
abandoned, but I think it's valuable.

--
Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/
"Porque francamente, si para saber manejarse a uno mismo hubiera que
rendir examen... ¿Quién es el machito que tendría carnet?" (Mafalda)

From:	Japin Li <japinli(at)hotmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh(dot)rs0541(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: Support logical replication of DDLs
Date:	2022-03-17 00:08:19
Message-ID:	MEYP282MB1669F966875F395565BB3728B6129@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Hi, Zhang Li

On Thu, 17 Mar 2022 at 05:17, Zheng Li <zhengli10(at)gmail(dot)com> wrote:
> Hi,
>
>>If you don't mind, would you like to share the POC or the branch for this work?
>
> The POC patch is attached. It currently supports the following functionalities:
> 1. Configure either database level or table level DDL replication via
> the CREATE PUBLICATION command.
>
> 2.Supports replication of DDL of the following types when database
> level DDL replication is turned on. Other less common DDL types could
> be added later.
> T_CreateSchemaStmt
> T_CreateStmt
> T_CreateForeignTableStmt
> T_AlterDomainStmt
> T_DefineStmt
> T_CompositeTypeStmt
> T_CreateEnumStmt
> T_CreateRangeStmt
> T_AlterEnumStmt
> T_ViewStmt
> T_CreateFunctionStmt
> T_AlterFunctionStmt
> T_CreateTrigStmt
> T_CreateDomainStmt
> T_CreateCastStmt
> T_CreateOpClassStmt
> T_CreateOpFamilyStmt
> T_AlterOpFamilyStmt
> T_AlterOperatorStmt
> T_AlterTypeStmt
> T_GrantStmt
> T_AlterCollationStmt
> T_AlterTableStmt
> T_IndexStmt
>
> 3.Supports replication of DDLs of the following types if only table
> level DDL replication is turned on.
> T_AlterTableStmt
> T_IndexStmt
>
> 4.Supports seamless DML replication of new tables created via DDL
> replication without having to manually running “ALTER SUBSCRIPTION ...
> REFRESH PUBLICATION" on the subscriber.
>

Great! I think we can split the patch into several pieces (at least
implementation and test cases) for easier review.

+ * Simiarl to the generic logical messages, These DDL messages can be either
+ * transactional or non-transactional. Note by default DDLs in PostgreSQL are
+ * transactional.

There is a typo, s/Simiarl/Similar/.

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

From:	Japin Li <japinli(at)hotmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh(dot)rs0541(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: Support logical replication of DDLs
Date:	2022-03-17 15:56:01
Message-ID:	MEYP282MB16690FB2E6DF4CB5AB0E826EB6129@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Hi,

When I try to run regression test, there has some errors.

make[2]: Entering directory '/home/px/Codes/postgresql/Debug/src/test/subscription'
rm -rf '/home/px/Codes/postgresql/Debug/src/test/subscription'/tmp_check && /usr/bin/mkdir -p '/home/px/Codes/postgresql/Debug/src/test/subscription'/tmp_check && cd /home/px/Codes/postgresql/Debug/../src
/test/subscription && TESTDIR='/home/px/Codes/postgresql/Debug/src/test/subscription' PATH="/home/px/Codes/postgresql/Debug/tmp_install/home/px/Codes/postgresql/Debug/pg/bin:/home/px/Codes/postgresql/Debu
g/src/test/subscription:$PATH" LD_LIBRARY_PATH="/home/px/Codes/postgresql/Debug/tmp_install/home/px/Codes/postgresql/Debug/pg/lib:$LD_LIBRARY_PATH" PGPORT='65432' PG_REGRESS='/home/px/Codes/postgresql/De
bug/src/test/subscription/../../../src/test/regress/pg_regress' /usr/bin/prove -I /home/px/Codes/postgresql/Debug/../src/test/perl/ -I /home/px/Codes/postgresql/Debug/../src/test/subscription t/*.pl
t/001_rep_changes.pl ............... ok
t/002_types.pl ..................... ok
t/003_constraints.pl ............... ok
t/004_sync.pl ...................... 6/? # Tests were run but no plan was declared and done_testing() was not seen.
# Looks like your test exited with 29 just after 7.
t/004_sync.pl ...................... Dubious, test returned 29 (wstat 7424, 0x1d00)
All 7 subtests passed
t/005_encoding.pl .................. ok
t/006_rewrite.pl ................... 1/? # poll_query_until timed out executing this query:
# SELECT '0/14A8648' <= replay_lsn AND state = 'streaming' FROM pg_catalog.pg_stat_replication WHERE application_name = 'mysub';
# expecting this output:
# t
# last actual query output:
#
# with stderr:
# Tests were run but no plan was declared and done_testing() was not seen.
# Looks like your test exited with 29 just after 1.
t/006_rewrite.pl ................... Dubious, test returned 29 (wstat 7424, 0x1d00)
All 1 subtests passed
t/007_ddl.pl ....................... ok
t/008_diff_schema.pl ............... 1/? # Tests were run but no plan was declared and done_testing() was not seen.
# Looks like your test exited with 29 just after 4.
t/008_diff_schema.pl ............... Dubious, test returned 29 (wstat 7424, 0x1d00)
All 4 subtests passed
t/009_matviews.pl .................. Dubious, test returned 29 (wstat 7424, 0x1d00)
No subtests run

The new patch change the behavior of publication, when we drop a table
on publisher, the table also be dropped on subscriber, and this made the
regression testa failed, since the regression test will try to drop the
table on subscriber.

IMO, we can disable the DDLs replication by default, which makes the
regression test happy. Any thoughts?

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Japin Li <japinli(at)hotmail(dot)com>
Cc:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rajesh(dot)rs0541(at)gmail(dot)com
Subject:	Re: Support logical replication of DDLs
Date:	2022-03-17 16:22:46
Message-ID:	CAAD30UKRUusq8JyyHzAv71=ncN22OE8OkOOyAWvRHW3wXNjyyA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Hello Alvaro,

> I think this is a pretty interesting and useful feature.
>
> Did you see some old code I wrote towards this goal?
> https://www.postgresql.org/message-id/[email protected]
> The intention was that DDL would produce some JSON blob that accurately
> describes the DDL that was run; the caller can acquire that and use it
> to produce working DDL that doesn't depend on runtime conditions. There
> was lots of discussion on doing things this way. It was ultimately
> abandoned, but I think it's valuable.

Thanks for pointing this out. I'm looking into it.

Hello Japin,
>The new patch change the behavior of publication, when we drop a table
>on publisher, the table also be dropped on subscriber, and this made the
>regression testa failed, since the regression test will try to drop the
>table on subscriber.

>IMO, we can disable the DDLs replication by default, which makes the
>regression test happy. Any thoughts?

Good catch, I forgot to run the whole TAP test suite. I think for now
we can simply disable DDL replication for the failing tests when
creating publication: $node_publisher->safe_psql('postgres',
"CREATE PUBLICATION tap_pub FOR ALL TABLES WITH (ddl='')");
I'll fix it in the next version of patch. I'll also work on breaking
down the patch into smaller pieces for ease of review.

Regards,
Zheng

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	Japin Li <japinli(at)hotmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh(dot)rs0541(at)gmail(dot)com, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-03-18 00:18:18
Message-ID:	CAAD30UKTp87+kvGZYL3M2Suxq=WEvFUG24ZRT0yT9rqdkP=uMA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Hello,

Attached please find the broken down patch set. Also fixed the failing
TAP tests Japin reported.

Regards,
Zheng Li
Amazon RDS/Aurora for PostgreSQL

Attachment	Content-Type	Size
0001-syntax-pg_publication-pg_dump-ddl_replication.patch	application/octet-stream	62.6 KB
0002-logical_decoding_ddl_message.patch	application/octet-stream	57.5 KB
0003-pgoutput-worker-ddl_replication.patch	application/octet-stream	33.7 KB

From:	Japin Li <japinli(at)hotmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rajesh(dot)rs0541(at)gmail(dot)com
Subject:	Re: Support logical replication of DDLs
Date:	2022-03-18 00:22:35
Message-ID:	MEYP282MB16698BCAA678137608F5DBA0B6139@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, 18 Mar 2022 at 00:22, Zheng Li <zhengli10(at)gmail(dot)com> wrote:
> Hello Japin,
>>The new patch change the behavior of publication, when we drop a table
>>on publisher, the table also be dropped on subscriber, and this made the
>>regression testa failed, since the regression test will try to drop the
>>table on subscriber.
>
>>IMO, we can disable the DDLs replication by default, which makes the
>>regression test happy. Any thoughts?
>
> Good catch, I forgot to run the whole TAP test suite. I think for now
> we can simply disable DDL replication for the failing tests when
> creating publication: $node_publisher->safe_psql('postgres',
> "CREATE PUBLICATION tap_pub FOR ALL TABLES WITH (ddl='')");
> I'll fix it in the next version of patch. I'll also work on breaking
> down the patch into smaller pieces for ease of review.
>

Oh, it doesn't work.

postgres=# CREATE PUBLICATION s1 FOR ALL TABLES WITH(ddl = '');;
ERROR: conflicting or redundant options

Here is the code, I think, you might mean `if (ddl_level_given == NULL)`.

+ if (*ddl_level_given)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options")));
+
+ /*
+ * If publish option was given only the explicitly listed actions
+ * should be published.
+ */
+ pubactions->pubddl_database = false;
+ pubactions->pubddl_table = false;
+
+ *ddl_level_given = true;

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

From:	Japin Li <japinli(at)hotmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh(dot)rs0541(at)gmail(dot)com, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-03-18 00:25:24
Message-ID:	MEYP282MB166989FCDD5574569D5D011DB6139@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, 18 Mar 2022 at 08:18, Zheng Li <zhengli10(at)gmail(dot)com> wrote:
> Hello,
>
> Attached please find the broken down patch set. Also fixed the failing
> TAP tests Japin reported.
>

Thanks for updating the patchset, I will try it later.

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

From:	Japin Li <japinli(at)hotmail(dot)com>
To:	Japin Li <japinli(at)hotmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh(dot)rs0541(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: Support logical replication of DDLs
Date:	2022-03-18 00:38:33
Message-ID:	MEYP282MB1669DDF788C623B7F8B64C2EB6139@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, 18 Mar 2022 at 08:22, Japin Li <japinli(at)hotmail(dot)com> wrote:
> On Fri, 18 Mar 2022 at 00:22, Zheng Li <zhengli10(at)gmail(dot)com> wrote:
>> Hello Japin,
>>>The new patch change the behavior of publication, when we drop a table
>>>on publisher, the table also be dropped on subscriber, and this made the
>>>regression testa failed, since the regression test will try to drop the
>>>table on subscriber.
>>
>>>IMO, we can disable the DDLs replication by default, which makes the
>>>regression test happy. Any thoughts?
>>
>> Good catch, I forgot to run the whole TAP test suite. I think for now
>> we can simply disable DDL replication for the failing tests when
>> creating publication: $node_publisher->safe_psql('postgres',
>> "CREATE PUBLICATION tap_pub FOR ALL TABLES WITH (ddl='')");
>> I'll fix it in the next version of patch. I'll also work on breaking
>> down the patch into smaller pieces for ease of review.
>>
>
> Oh, it doesn't work.
>
> postgres=# CREATE PUBLICATION s1 FOR ALL TABLES WITH(ddl = '');;
> ERROR: conflicting or redundant options
>
>
> Here is the code, I think, you might mean `if (ddl_level_given == NULL)`.
>
> + if (*ddl_level_given)
> + ereport(ERROR,
> + (errcode(ERRCODE_SYNTAX_ERROR),
> + errmsg("conflicting or redundant options")));
> +
> + /*
> + * If publish option was given only the explicitly listed actions
> + * should be published.
> + */
> + pubactions->pubddl_database = false;
> + pubactions->pubddl_table = false;
> +
> + *ddl_level_given = true;

Oh, Sorry, I misunderstand it, it just like you forget initialize
*ddl_level_given to false when enter parse_publication_options().

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

From:	Japin Li <japinli(at)hotmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh(dot)rs0541(at)gmail(dot)com, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-03-18 10:19:38
Message-ID:	MEYP282MB1669863D5C31D7F6A1D996D8B6139@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, 18 Mar 2022 at 08:18, Zheng Li <zhengli10(at)gmail(dot)com> wrote:
> Hello,
>
> Attached please find the broken down patch set. Also fixed the failing
> TAP tests Japin reported.
>

Here are some comments for the new patches:

Seems like you forget initializing the *ddl_level_given after entering the
parse_publication_options(), see [1].

+ if (*ddl_level_given)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options")));

We can use the errorConflictingDefElem() to replace the ereport() to make the
error message more readable.

I also think that ddl = '' isn't a good way to disable DDL replication, how
about using ddl = 'none' or something else?

The test_decoding test case cannot work as expected, see below:

diff -U3 /home/px/Codes/postgresql/contrib/test_decoding/expected/ddlmessages.out /home/px/Codes/postgresql/Debug/contrib/test_decoding/results/ddlmessages.out
--- /home/px/Codes/postgresql/contrib/test_decoding/expected/ddlmessages.out 2022-03-18 08:46:57.653922671 +0800
+++ /home/px/Codes/postgresql/Debug/contrib/test_decoding/results/ddlmessages.out 2022-03-18 17:34:33.411563601 +0800
@@ -25,8 +25,8 @@
SELECT pg_drop_replication_slot('regression_slot');
DROP TABLE tab1;
DROP publication mypub;
- data
----------------------------------------------------------------------------------------------------------------------------------------------------
+ data
+-----------------------------------------------------------------------------------------------------------------------------------------------
DDL message: transactional: 1 prefix: role: redacted, search_path: "$user", public, sz: 47 content:CREATE TABLE tab1 (id serial unique, data int);
BEGIN
sequence public.tab1_id_seq: transactional:1 last_value: 1 log_cnt: 0 is_called:0

Since the DDL message contains the username, and we try to replace the username with 'redacted' to avoid this problem,

\o | sed 's/role.*search_path/role: redacted, search_path/g'

However, the title and dash lines may have different lengths for different
usernames. To avoid this problem, how about using a specified username when
initializing the database for this regression test?

I try to disable the ddlmessage regression test,

make[2]: Entering directory '/home/px/Codes/postgresql/Debug/src/bin/pg_dump'
rm -rf '/home/px/Codes/postgresql/Debug/src/bin/pg_dump'/tmp_check && /usr/bin/mkdir -p '/home/px/Codes/postgresql/Debug/src/bin/pg_dump'/tmp_check && cd /home/px/Codes/postgresql/Debug/../src/bin/pg_dump
&& TESTDIR='/home/px/Codes/postgresql/Debug/src/bin/pg_dump' PATH="/home/px/Codes/postgresql/Debug/tmp_install/home/px/Codes/postgresql/Debug/pg/bin:/home/px/Codes/postgresql/Debug/src/bin/pg_dump:$PATH"
LD_LIBRARY_PATH="/home/px/Codes/postgresql/Debug/tmp_install/home/px/Codes/postgresql/Debug/pg/lib:$LD_LIBRARY_PATH" PGPORT='65432' PG_REGRESS='/home/px/Codes/postgresql/Debug/src/bin/pg_dump/../../../s
rc/test/regress/pg_regress' /usr/bin/prove -I /home/px/Codes/postgresql/Debug/../src/test/perl/ -I /home/px/Codes/postgresql/Debug/../src/bin/pg_dump t/*.pl
t/001_basic.pl ................ ok
t/002_pg_dump.pl .............. 13/?
# Failed test 'binary_upgrade: should dump CREATE PUBLICATION pub1'
# at t/002_pg_dump.pl line 3916.
# Review binary_upgrade results in /home/px/Codes/postgresql/Debug/src/bin/pg_dump/tmp_check/tmp_test_jgRI

# Failed test 'binary_upgrade: should dump CREATE PUBLICATION pub2'
# at t/002_pg_dump.pl line 3916.
# Review binary_upgrade results in /home/px/Codes/postgresql/Debug/src/bin/pg_dump/tmp_check/tmp_test_jgRI

# Failed test 'binary_upgrade: should dump CREATE PUBLICATION pub3'
# at t/002_pg_dump.pl line 3916.
# Review binary_upgrade results in /home/px/Codes/postgresql/Debug/src/bin/pg_dump/tmp_check/tmp_test_jgRI

# Failed test 'binary_upgrade: should dump CREATE PUBLICATION pub4'
# at t/002_pg_dump.pl line 3916.
# Review binary_upgrade results in /home/px/Codes/postgresql/Debug/src/bin/pg_dump/tmp_check/tmp_test_jgRI
t/002_pg_dump.pl .............. 240/?
[...]
# Review section_post_data results in /home/px/Codes/postgresql/Debug/src/bin/pg_dump/tmp_check/tmp_test_jgRI
t/002_pg_dump.pl .............. 7258/? # Looks like you failed 84 tests of 7709.
t/002_pg_dump.pl .............. Dubious, test returned 84 (wstat 21504, 0x5400)
Failed 84/7709 subtests
t/003_pg_dump_with_server.pl .. ok
t/010_dump_connstr.pl ......... ok

Test Summary Report
-------------------
t/002_pg_dump.pl (Wstat: 21504 Tests: 7709 Failed: 84)
Failed tests: 136-139, 362-365, 588-591, 1040-1043, 1492-1495
1719-1722, 1946-1949, 2177-2180, 2407-2410
2633-2636, 2859-2862, 3085-3088, 3537-3540
3763-3766, 3989-3992, 4215-4218, 4441-4444
5119-5122, 5345-5348, 6702-6705, 7154-7157
Non-zero exit status: 84
Files=4, Tests=7808, 26 wallclock secs ( 0.46 usr 0.05 sys + 7.98 cusr 2.80 csys = 11.29 CPU)
Result: FAIL
make[2]: *** [Makefile:55: check] Error 1
make[2]: Leaving directory '/home/px/Codes/postgresql/Debug/src/bin/pg_dump'
make[1]: *** [Makefile:43: check-pg_dump-recurse] Error 2
make[1]: Leaving directory '/home/px/Codes/postgresql/Debug/src/bin'
make: *** [GNUmakefile:71: check-world-src/bin-recurse] Error 2

And, when I try to use git am to apply the patch, it complains:

$ git am ~/0001-syntax-pg_publication-pg_dump-ddl_replication.patch
Patch format detection failed.

[1] https://www.postgresql.org/message-id/MEYP282MB1669DDF788C623B7F8B64C2EB6139%40MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	Japin Li <japinli(at)hotmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh(dot)rs0541(at)gmail(dot)com, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-03-18 17:25:09
Message-ID:	CAAD30UKc=GiGQzE8H7+Ofo18hwMOfK4qUm_KUyw6c09q4JvA5Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Hello,

Thanks for the quick review!

> And, when I try to use git am to apply the patch, it complains:
>
> $ git am ~/0001-syntax-pg_publication-pg_dump-ddl_replication.patch
> Patch format detection failed.

git apply works for me. I'm not sure why git am complains.
I also created a git branch to make code sharing easier, please try this out:
https://github.com/zli236/postgres/tree/ddl_replication

> Seems like you forget initializing the *ddl_level_given after entering the
> parse_publication_options(), see [1].
>
>
> + if (*ddl_level_given)
> + ereport(ERROR,
> + (errcode(ERRCODE_SYNTAX_ERROR),
> + errmsg("conflicting or redundant options")));
>
> We can use the errorConflictingDefElem() to replace the ereport() to make the
> error message more readable.

Agreed. Fixed in the latest version.

> I also think that ddl = '' isn't a good way to disable DDL replication, how
> about using ddl = 'none' or something else?

The syntax follows the existing convention of the WITH publish option.
For example in CREATE PUBLICATION mypub WITH (publish = '')
it means don't publish any DML change. So I'd leave it as is for now.

> The test_decoding test case cannot work as expected, see below:
.....
> DDL message: transactional: 1 prefix: role: redacted, search_path: "$user", public, sz: 47 content:CREATE TABLE tab1 (id serial unique, data int);
> BEGIN
> sequence public.tab1_id_seq: transactional:1 last_value: 1 log_cnt: 0 is_called:0
>
> Since the DDL message contains the username, and we try to replace the username with 'redacted' to avoid this problem,
>
> \o | sed 's/role.*search_path/role: redacted, search_path/g'
>
> However, the title and dash lines may have different lengths for different
> usernames. To avoid this problem, how about using a specified username when
> initializing the database for this regression test?

I don't understand the question, do you have an example of when the
test doesn't work? It runs fine for me.

> t/002_pg_dump.pl .............. 13/?
> # Failed test 'binary_upgrade: should dump CREATE PUBLICATION pub1'
> # at t/002_pg_dump.pl line 3916.
> # Review binary_upgrade results in /home/px/Codes/postgresql/Debug/src/bin/pg_dump/tmp_check/tmp_test_jgRI
......
> Failed 84/7709 subtests
> t/003_pg_dump_with_server.pl .. ok
> t/010_dump_connstr.pl ......... ok
>
> Test Summary Report
> -------------------
> t/002_pg_dump.pl (Wstat: 21504 Tests: 7709 Failed: 84)

This is fixed in the latest version. I need to remind myself to run
make check-world in the future.

Regards,
Zheng Li

From:	Zheng Li <zhelli(at)amazon(dot)com>
To:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc:	Zheng Li <zhelli(at)amazon(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-03-18 17:29:07
Message-ID:	164762454752.1210.2939511148604659322.pgcf@coridan.postgresql.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

git branch of the change:
https://github.com/zli236/postgres/tree/ddl_replication

From:	Japin Li <japinli(at)hotmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh(dot)rs0541(at)gmail(dot)com, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-03-18 23:33:29
Message-ID:	MEYP282MB16691E383140844437FB0633B6139@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Sat, 19 Mar 2022 at 01:25, Zheng Li <zhengli10(at)gmail(dot)com> wrote:
> Hello,
>
> Thanks for the quick review!
>
>> And, when I try to use git am to apply the patch, it complains:
>>
>> $ git am ~/0001-syntax-pg_publication-pg_dump-ddl_replication.patch
>> Patch format detection failed.
>
> git apply works for me. I'm not sure why git am complains.
> I also created a git branch to make code sharing easier, please try this out:
> https://github.com/zli236/postgres/tree/ddl_replication
>

Yeah, I can use git apply, I'm not sure how did you generate the patch.

>> I also think that ddl = '' isn't a good way to disable DDL replication, how
>> about using ddl = 'none' or something else?
>
> The syntax follows the existing convention of the WITH publish option.
> For example in CREATE PUBLICATION mypub WITH (publish = '')
> it means don't publish any DML change. So I'd leave it as is for now.
>

Agreed.

>> The test_decoding test case cannot work as expected, see below:
> .....
>> DDL message: transactional: 1 prefix: role: redacted, search_path: "$user", public, sz: 47 content:CREATE TABLE tab1 (id serial unique, data int);
>> BEGIN
>> sequence public.tab1_id_seq: transactional:1 last_value: 1 log_cnt: 0 is_called:0
>>
>> Since the DDL message contains the username, and we try to replace the username with 'redacted' to avoid this problem,
>>
>> \o | sed 's/role.*search_path/role: redacted, search_path/g'
>>
>> However, the title and dash lines may have different lengths for different
>> usernames. To avoid this problem, how about using a specified username when
>> initializing the database for this regression test?
>
> I don't understand the question, do you have an example of when the
> test doesn't work? It runs fine for me.
>

You should use a different user that has different length from your current one.
For example:

px(at)localhost$ make check-world

>
>> t/002_pg_dump.pl .............. 13/?
>> # Failed test 'binary_upgrade: should dump CREATE PUBLICATION pub1'
>> # at t/002_pg_dump.pl line 3916.
>> # Review binary_upgrade results in /home/px/Codes/postgresql/Debug/src/bin/pg_dump/tmp_check/tmp_test_jgRI
> ......
>> Failed 84/7709 subtests
>> t/003_pg_dump_with_server.pl .. ok
>> t/010_dump_connstr.pl ......... ok
>>
>> Test Summary Report
>> -------------------
>> t/002_pg_dump.pl (Wstat: 21504 Tests: 7709 Failed: 84)
>
> This is fixed in the latest version. I need to remind myself to run
> make check-world in the future.
>

Thanks for updating the patch.

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

From:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rajesh(dot)rs0541(at)gmail(dot)com
Subject:	Re: Support logical replication of DDLs
Date:	2022-03-21 08:13:34
Message-ID:	CAFiTN-uuE9PiZK-zCTR2ch+Knb15js+=Ot1nKUMK5_5JVK4yrw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Thu, Mar 17, 2022 at 2:47 AM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
>
> Hi,
>
> >If you don't mind, would you like to share the POC or the branch for this work?
>
> The POC patch is attached. It currently supports the following functionalities:

Thanks for sharing, I will look into it.

> >In such cases why don't we just log the table creation WAL for DDL
> >instead of a complete statement which creates the table and inserts
> >the tuple? Because we are already WAL logging individual inserts and
> >once you make sure of replicating the table creation I think the exact
> >data insertion on the subscriber side will be taken care of by the
> >insert WALs no?
>
> The table creation WAL and table insert WAL are available. The tricky
> part is how do we break down this command into two parts (a normal
> CREATE TABLE followed by insertions) either from the parsetree or the
> WALs. I’ll have to dig more on this.

I agree that this is a bit tricky, anyway I will also put more thoughts on this.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	Japin Li <japinli(at)hotmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh(dot)rs0541(at)gmail(dot)com, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-03-21 20:56:54
Message-ID:	CAAD30U+ZTBXLH0wWsW9+Zu2RECGKeaQNynLs7wKA0o86w8C-fw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Hi Japin,

> You should use a different user that has different length from your current one.
> For example:
>
> px(at)localhost$ make check-world

This is fixed in the latest commit:
https://github.com/zli236/postgres/commits/ddl_replication

Thanks,
Zheng

From:	Japin Li <japinli(at)hotmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh(dot)rs0541(at)gmail(dot)com, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-03-23 05:09:10
Message-ID:	MEYP282MB166926E46397CBFC113B4A7EB6189@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Tue, 22 Mar 2022 at 04:56, Zheng Li <zhengli10(at)gmail(dot)com> wrote:
> Hi Japin,
>
>> You should use a different user that has different length from your current one.
>> For example:
>>
>> px(at)localhost$ make check-world
>
> This is fixed in the latest commit:
> https://github.com/zli236/postgres/commits/ddl_replication
>

Thanks for fixing this. I rebase the patchset on current master (383f222119)
and attach here for further review.

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

Attachment	Content-Type	Size
v3-0001-Define-DDL-replication-levels-via-the-CREATE-PUBL.patch	text/x-patch	65.0 KB
v3-0002-Support-logical-logging-and-decoding-of-DDL-comma.patch	text/x-patch	59.3 KB
v3-0003-Logical-replication-of-DDL-messages.patch	text/x-patch	35.0 KB

From:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rajesh(dot)rs0541(at)gmail(dot)com
Subject:	Re: Support logical replication of DDLs
Date:	2022-03-24 10:02:01
Message-ID:	CAFiTN-sJB3Si2CXSfM8AeHk0K7binpY9r21WKmkd8y7Ebao7ZQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Mon, Mar 21, 2022 at 1:43 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Thu, Mar 17, 2022 at 2:47 AM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
> >
> > Hi,
> >
> > >If you don't mind, would you like to share the POC or the branch for this work?
> >
> > The POC patch is attached. It currently supports the following functionalities:
>
> Thanks for sharing, I will look into it.
>
> > >In such cases why don't we just log the table creation WAL for DDL
> > >instead of a complete statement which creates the table and inserts
> > >the tuple? Because we are already WAL logging individual inserts and
> > >once you make sure of replicating the table creation I think the exact
> > >data insertion on the subscriber side will be taken care of by the
> > >insert WALs no?
> >
> > The table creation WAL and table insert WAL are available. The tricky
> > part is how do we break down this command into two parts (a normal
> > CREATE TABLE followed by insertions) either from the parsetree or the
> > WALs. I’ll have to dig more on this.
>
> I agree that this is a bit tricky, anyway I will also put more thoughts on this.

I had put some more thought about this, basically, during CTAS we are
generating the CreateStmt inside "create_ctas_internal" and executing
it first before inserting the tuple, so can't we generate the
independent sql just for creating the tuple maybe using deparsing or
something?

Apart from that I have one more question, basically if you are
directly logging the sql query then how you are identifying under
which schema you need to create that table, are you changing the sql
and generating schema-qualified name?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

From:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rajesh(dot)rs0541(at)gmail(dot)com
Subject:	Re: Support logical replication of DDLs
Date:	2022-03-24 11:36:07
Message-ID:	CAFiTN-tqqr88EO6OHstDmmqxp2ZtZeROwkfFARuS2k+8CREPJA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Thu, Mar 24, 2022 at 3:32 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Mon, Mar 21, 2022 at 1:43 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Thu, Mar 17, 2022 at 2:47 AM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
> > >
> > > Hi,
> > >
> > > >If you don't mind, would you like to share the POC or the branch for this work?
> > >
> > > The POC patch is attached. It currently supports the following functionalities:
> >
> > Thanks for sharing, I will look into it.
> >
> > > >In such cases why don't we just log the table creation WAL for DDL
> > > >instead of a complete statement which creates the table and inserts
> > > >the tuple? Because we are already WAL logging individual inserts and
> > > >once you make sure of replicating the table creation I think the exact
> > > >data insertion on the subscriber side will be taken care of by the
> > > >insert WALs no?
> > >
> > > The table creation WAL and table insert WAL are available. The tricky
> > > part is how do we break down this command into two parts (a normal
> > > CREATE TABLE followed by insertions) either from the parsetree or the
> > > WALs. I’ll have to dig more on this.
> >
> > I agree that this is a bit tricky, anyway I will also put more thoughts on this.
>
> I had put some more thought about this, basically, during CTAS we are
> generating the CreateStmt inside "create_ctas_internal" and executing
> it first before inserting the tuple, so can't we generate the
> independent sql just for creating the tuple maybe using deparsing or
> something?
>
> Apart from that I have one more question, basically if you are
> directly logging the sql query then how you are identifying under
> which schema you need to create that table, are you changing the sql
> and generating schema-qualified name?

I was going through the patch and it seems you are logging the search
path as well along with the query so I think this will probably work.
I have got one more query while looking into the code. In the below
code snippet you are logging DDL command only if it is a top level
query but there are no comments explaining what sort of queries we
don't want to log. Suppose I am executing a DDL statement inside a PL
then that will not be a top level statement so is your intention to
block that as well or that is an unintentional side effect?

+ /*
+ * Consider logging the DDL command if logical logging is
enabled and this is
+ * a top level query.
+ */
+ if (XLogLogicalInfoActive() && isTopLevel)
+ LogLogicalDDLCommand(parsetree, queryString);
+

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rajesh(dot)rs0541(at)gmail(dot)com
Subject:	Re: Support logical replication of DDLs
Date:	2022-03-24 17:54:39
Message-ID:	CAAD30UJ8v7RAxfLLx-nVO9COKNeBiKVzcwu1LdCe5dZJU2bUDA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Hi Dilip,

Thanks for the feedback.

> > > > The table creation WAL and table insert WAL are available. The tricky
> > > > part is how do we break down this command into two parts (a normal
> > > > CREATE TABLE followed by insertions) either from the parsetree or the
> > > > WALs. I’ll have to dig more on this.

> > I had put some more thought about this, basically, during CTAS we are
> > generating the CreateStmt inside "create_ctas_internal" and executing
> > it first before inserting the tuple, so can't we generate the
> > independent sql just for creating the tuple maybe using deparsing or
> > something?

Yes, deparsing might help for edge cases like this. However I found
a simple solution for this specific case:

The idea is to force skipping any direct data population (which can
potentially cause data inconsistency on the subscriber)
in CREATE AS and SELECT INTO command on the subscriber by forcing the
skipData flag in the intoClause of the parsetree after
the logical replication worker parses the command. The data sync will
be taken care of by the DML replication after the DDL replication
finishes.

This is implemented in the latest commit:
https://github.com/zli236/postgres/commit/116c33451da8d44577b8d6fdb05c4b6998cd0167

> > Apart from that I have one more question, basically if you are
> > directly logging the sql query then how you are identifying under
> > which schema you need to create that table, are you changing the sql
> > and generating schema-qualified name?
>
> I was going through the patch and it seems you are logging the search
> path as well along with the query so I think this will probably work.

Yes, currently we log the search path as well as the user name. And we
enforce the same search path and user name when applying the DDL command
on the subscriber.

> I have got one more query while looking into the code. In the below
> code snippet you are logging DDL command only if it is a top level
> query but there are no comments explaining what sort of queries we
> don't want to log. Suppose I am executing a DDL statement inside a PL
> then that will not be a top level statement so is your intention to
> block that as well or that is an unintentional side effect?
>
> + /*
> + * Consider logging the DDL command if logical logging is
> enabled and this is
> + * a top level query.
> + */
> + if (XLogLogicalInfoActive() && isTopLevel)
> + LogLogicalDDLCommand(parsetree, queryString);

Good catch. The reason for having isTopLevel in the condition is
because I haven't decided if a DDL statement inside a PL should
be replicated from the user point of view. For example, if I execute a
plpgsql function or a stored procedure which creates a table under the hood,
does it always make sense to replicate the DDL without running the same
function or stored procedure on the subscriber? It probably depends on
the specific
use case. Maybe we can consider making this behavior configurable by the user.

Thanks,
Zheng

From:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rajesh(dot)rs0541(at)gmail(dot)com
Subject:	Re: Support logical replication of DDLs
Date:	2022-03-29 04:17:26
Message-ID:	CAFiTN-u9McY9Df2dUOX3u326LMJAY8uP6YE1K8GdpYp95x=S5Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Thu, Mar 24, 2022 at 11:24 PM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
>
> Hi Dilip,
>
> Thanks for the feedback.
>
> > > > > The table creation WAL and table insert WAL are available. The tricky
> > > > > part is how do we break down this command into two parts (a normal
> > > > > CREATE TABLE followed by insertions) either from the parsetree or the
> > > > > WALs. I’ll have to dig more on this.
>
> > > I had put some more thought about this, basically, during CTAS we are
> > > generating the CreateStmt inside "create_ctas_internal" and executing
> > > it first before inserting the tuple, so can't we generate the
> > > independent sql just for creating the tuple maybe using deparsing or
> > > something?
>
> Yes, deparsing might help for edge cases like this. However I found
> a simple solution for this specific case:
>
> The idea is to force skipping any direct data population (which can
> potentially cause data inconsistency on the subscriber)
> in CREATE AS and SELECT INTO command on the subscriber by forcing the
> skipData flag in the intoClause of the parsetree after
> the logical replication worker parses the command. The data sync will
> be taken care of by the DML replication after the DDL replication
> finishes.

Okay, something like that should work, I am not sure it is the best
design though.

> This is implemented in the latest commit:
> https://github.com/zli236/postgres/commit/116c33451da8d44577b8d6fdb05c4b6998cd0167
>
> > > Apart from that I have one more question, basically if you are
> > > directly logging the sql query then how you are identifying under
> > > which schema you need to create that table, are you changing the sql
> > > and generating schema-qualified name?
> >
> > I was going through the patch and it seems you are logging the search
> > path as well along with the query so I think this will probably work.
>
> Yes, currently we log the search path as well as the user name. And we
> enforce the same search path and user name when applying the DDL command
> on the subscriber.

Yeah this looks fine to me.

>
> > I have got one more query while looking into the code. In the below
> > code snippet you are logging DDL command only if it is a top level
> > query but there are no comments explaining what sort of queries we
> > don't want to log. Suppose I am executing a DDL statement inside a PL
> > then that will not be a top level statement so is your intention to
> > block that as well or that is an unintentional side effect?
> >
> > + /*
> > + * Consider logging the DDL command if logical logging is
> > enabled and this is
> > + * a top level query.
> > + */
> > + if (XLogLogicalInfoActive() && isTopLevel)
> > + LogLogicalDDLCommand(parsetree, queryString);
>
> Good catch. The reason for having isTopLevel in the condition is
> because I haven't decided if a DDL statement inside a PL should
> be replicated from the user point of view. For example, if I execute a
> plpgsql function or a stored procedure which creates a table under the hood,
> does it always make sense to replicate the DDL without running the same
> function or stored procedure on the subscriber? It probably depends on
> the specific
> use case. Maybe we can consider making this behavior configurable by the user.

But then this could be true for DML as well right? Like after
replicating the function to the subscriber if we are sending the DML
done by function then what's the problem in DDL. I mean if there is
no design issue in implementing this then I don't think there is much
point in blocking the same or even providing configuration for this.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

From:	Ajin Cherian <itsajin(at)gmail(dot)com>
To:	Japin Li <japinli(at)hotmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh(dot)rs0541(at)gmail(dot)com, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-04-06 11:37:06
Message-ID:	CAFPTHDYjRsuehWReLNodspMQbmGc=gs3Ge8PTFur5T+TNC+-5g@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Wed, Mar 23, 2022 at 4:09 PM Japin Li <japinli(at)hotmail(dot)com> wrote:
>
>
> On Tue, 22 Mar 2022 at 04:56, Zheng Li <zhengli10(at)gmail(dot)com> wrote:
> > Hi Japin,
> >
> >> You should use a different user that has different length from your current one.
> >> For example:
> >>
> >> px(at)localhost$ make check-world
> >
> > This is fixed in the latest commit:
> > https://github.com/zli236/postgres/commits/ddl_replication
> >
>
> Thanks for fixing this. I rebase the patchset on current master (383f222119)
> and attach here for further review.
>

The patch no longer applies. The patch is a very good attempt, and I
would also like to contribute if required.
I have a few comments but will hold it till a rebased version is available.

regards,
Ajin Cherian
Fujitsu Australia

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Japin Li <japinli(at)hotmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh(dot)rs0541(at)gmail(dot)com, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-04-07 10:16:32
Message-ID:	CAA4eK1J4AekmEKgmfp6e-zZz4M02m7w7uxvC2tjqmjF-LDSGDA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Wed, Mar 23, 2022 at 10:39 AM Japin Li <japinli(at)hotmail(dot)com> wrote:
>
> Thanks for fixing this. I rebase the patchset on current master (383f222119)
> and attach here for further review.
>

Some initial comments:
===================
1.
+/*
+ * Write logical decoding DDL message into XLog.
+ */
+XLogRecPtr
+LogLogicalDDLMessage(const char *prefix, Oid roleoid, const char *message,
+ size_t size, bool transactional)

I don't see anywhere the patch using a non-transactional message. Is
it required for any future usage? The non-transactional behavior has
some known consistency issues, see email [1].

2. For DDL replication, do we need to wait for a consistent point of
snapshot? For DMLs, that point is a convenient point to initialize
replication from, which is why we export a snapshot at that point,
which is used to read normal data. Do we have any similar needs for
DDL replication?

3. The patch seems to be creating an entry in pg_subscription_rel for
'create' message. Do we need some handling on Drop, if not, why? I
think adding some comments for this aspect would make it easier to
follow.

4. The handling related to partition tables seems missing because, on
the subscriber-side, it always creates a relation entry in
pg_subscription_rel which won't work. Check its interaction with
publish_via_partition_root.

[1] - https://www.postgresql.org/message-id/CAA4eK1KAFdQEULk%2B4C%3DieWA5UPSUtf_gtqKsFj9J9f2c%3D8hm4g%40mail.gmail.com

--
With Regards,
Amit Kapila.

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rajesh(dot)rs0541(at)gmail(dot)com
Subject:	Re: Support logical replication of DDLs
Date:	2022-04-07 10:19:10
Message-ID:	CAA4eK1LC=K7vqBBPGUovYJ9voqmf2-nEcpfonO10weAJQhob4A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Tue, Mar 29, 2022 at 9:47 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Thu, Mar 24, 2022 at 11:24 PM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
> >
> >
> > Good catch. The reason for having isTopLevel in the condition is
> > because I haven't decided if a DDL statement inside a PL should
> > be replicated from the user point of view. For example, if I execute a
> > plpgsql function or a stored procedure which creates a table under the hood,
> > does it always make sense to replicate the DDL without running the same
> > function or stored procedure on the subscriber? It probably depends on
> > the specific
> > use case. Maybe we can consider making this behavior configurable by the user.
>
> But then this could be true for DML as well right? Like after
> replicating the function to the subscriber if we are sending the DML
> done by function then what's the problem in DDL. I mean if there is
> no design issue in implementing this then I don't think there is much
> point in blocking the same or even providing configuration for this.
>

Valid point. I think if there are some design/implementation
constraints for this then it is better to document those(probably as
comments).

--
With Regards,
Amit Kapila.

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rajesh(dot)rs0541(at)gmail(dot)com
Subject:	Re: Support logical replication of DDLs
Date:	2022-04-08 11:00:13
Message-ID:	CAA4eK1KGUyAh5_UG=RvhbeQh-aHvtgb4tsZY-5fb1Yw29FSrHw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Tue, Mar 29, 2022 at 9:47 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > The idea is to force skipping any direct data population (which can
> > potentially cause data inconsistency on the subscriber)
> > in CREATE AS and SELECT INTO command on the subscriber by forcing the
> > skipData flag in the intoClause of the parsetree after
> > the logical replication worker parses the command. The data sync will
> > be taken care of by the DML replication after the DDL replication
> > finishes.
>
> Okay, something like that should work, I am not sure it is the best
> design though.
>

Even if this works, how will we make Alter Table statement work where
it needs to rewrite the table? There also I think we can face a
similar problem if we directly send the statement, once the table will
be updated due to the DDL statement and then again due to table
rewrite as that will have a separate WAL.

Another somewhat unrelated problem I see with this work is how to save
recursion of the same command between nodes (when the involved nodes
replicate DDLs). For DMLs, we can avoid that via replication origins
as is being done in the patch proposed [1] but not sure how will we
deal with that here?

[1] - https://commitfest.postgresql.org/38/3610/

--
With Regards,
Amit Kapila.

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rajesh(dot)rs0541(at)gmail(dot)com
Subject:	Re: Support logical replication of DDLs
Date:	2022-04-08 11:13:08
Message-ID:	CAA4eK1JE4wh0iQ=+Dy69FqpStv6Z7aOtFiyiFcvs_phj0sxgUA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Thu, Mar 17, 2022 at 3:36 AM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>
> Did you see some old code I wrote towards this goal?
> https://www.postgresql.org/message-id/[email protected]
> The intention was that DDL would produce some JSON blob that accurately
> describes the DDL that was run;
>

I have read that thread and found one of your emails [1] where you
seem to be saying that JSON representation is not required for BDR.
Will in some way going via JSON blob way make this project
easier/better?

> the caller can acquire that and use it
> to produce working DDL that doesn't depend on runtime conditions.
>

For runtime conditions, one of the things you have mentioned in that
thread is to add schema name in the statement at the required places
which this patch deals with in a different way by explicitly sending
it along with the DDL statement. The other cases where we might need
deparsing are Alter Table type cases (where we need to rewrite the
table) where we may want to send a different DDL. I haven't analyzed
but I think it is better to have a list where all we need deparsing
and what is the best way to deal with it. The simpler cases seem to be
working with the approach proposed by this patch but I am not sure if
it will work for all kinds of cases.

[1] - https://www.postgresql.org/message-id/20150504185721.GB2523%40alvh.no-ip.org

--
With Regards,
Amit Kapila.

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rajesh(dot)rs0541(at)gmail(dot)com
Subject:	Re: Support logical replication of DDLs
Date:	2022-04-08 11:34:37
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On 2022-Apr-08, Amit Kapila wrote:

> On Thu, Mar 17, 2022 at 3:36 AM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> >
> > Did you see some old code I wrote towards this goal?
> > https://www.postgresql.org/message-id/[email protected]
> > The intention was that DDL would produce some JSON blob that accurately
> > describes the DDL that was run;
>
> I have read that thread and found one of your emails [1] where you
> seem to be saying that JSON representation is not required for BDR.
> Will in some way going via JSON blob way make this project
> easier/better?

I don't know if replication support will be easier by using JSON; I just
think that JSON makes the overall feature more easily usable for other
purposes.

I am not familiar with BDR replication nowadays.

> For runtime conditions, one of the things you have mentioned in that
> thread is to add schema name in the statement at the required places
> which this patch deals with in a different way by explicitly sending
> it along with the DDL statement.

Hmm, ok. The point of the JSON-blob route is that the publisher sends a
command representation that can be parsed/processed/transformed
arbitrarily by the subscriber using generic rules; it should be trivial
to use a JSON tool to change schema A to schema B in any arbitrary DDL
command, and produce another working DDL command without having to know
how to write that command specifically. So if I have a rule that
"schema A there is schema B here", all DDL commands can be replayed with
no further coding (without having to rely on getting the run-time
search_path correct.)

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"El sudor es la mejor cura para un pensamiento enfermo" (Bardia)

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rajesh(dot)rs0541(at)gmail(dot)com
Subject:	Re: Support logical replication of DDLs
Date:	2022-04-08 13:12:40
Message-ID:	CA+TgmoauXRQ3yDZNGTzXv_m1kdUnH1Ww+hwKmKUSjtyBh0Em2Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, Apr 8, 2022 at 7:34 AM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> > For runtime conditions, one of the things you have mentioned in that
> > thread is to add schema name in the statement at the required places
> > which this patch deals with in a different way by explicitly sending
> > it along with the DDL statement.
>
> Hmm, ok. The point of the JSON-blob route is that the publisher sends a
> command representation that can be parsed/processed/transformed
> arbitrarily by the subscriber using generic rules; it should be trivial
> to use a JSON tool to change schema A to schema B in any arbitrary DDL
> command, and produce another working DDL command without having to know
> how to write that command specifically. So if I have a rule that
> "schema A there is schema B here", all DDL commands can be replayed with
> no further coding (without having to rely on getting the run-time
> search_path correct.)

Yeah, that was a really nice aspect of that approach.

--
Robert Haas
EDB: http://www.enterprisedb.com

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Japin Li <japinli(at)hotmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh(dot)rs0541(at)gmail(dot)com, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-04-11 05:00:43
Message-ID:	CAA4eK1JeGcos6UMu8F4aOTSBAYYqfaL4bE=PGWE43BPgCLaNCg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Thu, Apr 7, 2022 at 3:46 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Mar 23, 2022 at 10:39 AM Japin Li <japinli(at)hotmail(dot)com> wrote:
>
> 2. For DDL replication, do we need to wait for a consistent point of
> snapshot? For DMLs, that point is a convenient point to initialize
> replication from, which is why we export a snapshot at that point,
> which is used to read normal data. Do we have any similar needs for
> DDL replication?
>

I have thought a bit more about this and I think we need to build the
snapshot for DML replication as we need to read catalog tables to
decode the corresponding WAL but it is not clear to me if we have a
similar requirement for DDL replication. If the catalog access is
required then it makes sense to follow the current snapshot model,
otherwise, we may need to think differently for DDL replication.

One more related point is that for DML replication, we do ensure that
we copy the entire data of the table (via initial sync) which exists
even before the publication for that table exists, so do we want to do
something similar for DDLs? How do we sync the schema of the table
before the user has defined the publication? Say the table has been
created before the publication is defined and after that, there are
only Alter statements, so do we expect, users to create the table on
the subscriber and then we can replicate the Alter statements? And
even if we do that it won't be clear which Alter statements will be
replicated after publication is defined especially if those Alters
happened concurrently with defining publications?

--
With Regards,
Amit Kapila.

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rajesh(dot)rs0541(at)gmail(dot)com
Subject:	Re: Support logical replication of DDLs
Date:	2022-04-11 05:04:18
Message-ID:	CAA4eK1Ljft2OSD1=+Ptx6GKKU76KHZW+R10GdTqvhPrLn1JsDA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, Apr 8, 2022 at 5:04 PM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>
> On 2022-Apr-08, Amit Kapila wrote:
>
>
> > For runtime conditions, one of the things you have mentioned in that
> > thread is to add schema name in the statement at the required places
> > which this patch deals with in a different way by explicitly sending
> > it along with the DDL statement.
>
> Hmm, ok. The point of the JSON-blob route is that the publisher sends a
> command representation that can be parsed/processed/transformed
> arbitrarily by the subscriber using generic rules; it should be trivial
> to use a JSON tool to change schema A to schema B in any arbitrary DDL
> command, and produce another working DDL command without having to know
> how to write that command specifically. So if I have a rule that
> "schema A there is schema B here", all DDL commands can be replayed with
> no further coding (without having to rely on getting the run-time
> search_path correct.)
>

Okay, thanks for the clarification.

--
With Regards,
Amit Kapila.

From:	"Euler Taveira" <euler(at)eulerto(dot)com>
To:	"Amit Kapila" <amit(dot)kapila16(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>
Cc:	"Zheng Li" <zhengli10(at)gmail(dot)com>, "Alvaro Herrera" <alvherre(at)alvh(dot)no-ip(dot)org>, "Dilip Kumar" <dilipbalaut(at)gmail(dot)com>, rajesh(dot)rs0541(at)gmail(dot)com, "PostgreSQL Hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-04-11 12:46:06
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Mon, Apr 11, 2022, at 2:00 AM, Amit Kapila wrote:
> On Thu, Apr 7, 2022 at 3:46 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Wed, Mar 23, 2022 at 10:39 AM Japin Li <japinli(at)hotmail(dot)com> wrote:
> >
> > 2. For DDL replication, do we need to wait for a consistent point of
> > snapshot? For DMLs, that point is a convenient point to initialize
> > replication from, which is why we export a snapshot at that point,
> > which is used to read normal data. Do we have any similar needs for
> > DDL replication?
> >
>
> I have thought a bit more about this and I think we need to build the
> snapshot for DML replication as we need to read catalog tables to
> decode the corresponding WAL but it is not clear to me if we have a
> similar requirement for DDL replication. If the catalog access is
> required then it makes sense to follow the current snapshot model,
> otherwise, we may need to think differently for DDL replication.
>
> One more related point is that for DML replication, we do ensure that
> we copy the entire data of the table (via initial sync) which exists
> even before the publication for that table exists, so do we want to do
> something similar for DDLs? How do we sync the schema of the table
> before the user has defined the publication? Say the table has been
> created before the publication is defined and after that, there are
> only Alter statements, so do we expect, users to create the table on
> the subscriber and then we can replicate the Alter statements? And
> even if we do that it won't be clear which Alter statements will be
> replicated after publication is defined especially if those Alters
> happened concurrently with defining publications?
The *initial* DDL replication is a different problem than DDL replication. The
former requires a snapshot to read the current catalog data and build a CREATE
command as part of the subscription process. The subsequent DDLs in that object
will be handled by a different approach that is being discussed here.

I'm planning to work on the initial DDL replication. I'll open a new thread as
soon as I write a design for it. Just as an example, the pglogical approach is
to use pg_dump behind the scenes to provide the schema [1]. It is a reasonable
approach but an optimal solution should be an API to provide the initial DDL
commands. I mean the main point of this feature is to have an API to create an
object that the logical replication can use it for initial schema
synchronization. This "DDL to create an object" was already discussed in the
past [2].

[1] https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical_sync.c#L942
[2] https://www.postgresql.org/message-id/4E69156E.5060509%40dunslane.net

--
Euler Taveira
EDB https://www.enterprisedb.com/

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rajesh(dot)rs0541(at)gmail(dot)com
Subject:	Re: Support logical replication of DDLs
Date:	2022-04-11 16:42:01
Message-ID:	CAAD30UKeRsGfqqTC-qTf_3auqz43FTTHhfhbi3b_1_=ye818fw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Hi,

> > Good catch. The reason for having isTopLevel in the condition is
> > because I haven't decided if a DDL statement inside a PL should
> > be replicated from the user point of view. For example, if I execute a
> > plpgsql function or a stored procedure which creates a table under the hood,
> > does it always make sense to replicate the DDL without running the same
> > function or stored procedure on the subscriber? It probably depends on
> > the specific
> > use case. Maybe we can consider making this behavior configurable by the user.
>
> But then this could be true for DML as well right? Like after
> replicating the function to the subscriber if we are sending the DML
> done by function then what's the problem in DDL. I mean if there is
> no design issue in implementing this then I don't think there is much
> point in blocking the same or even providing configuration for this.

Agreed. I'll unblock DDLs in functions/procedures and test it out. I
find out some DDLs in functions are replicated multiple times on the
subscriber while they should only be replicated once. Still trying to
figure out why.

>The patch no longer applies. The patch is a very good attempt, and I
>would also like to contribute if required.
>I have a few comments but will hold it till a rebased version is available.

Hi, Ajin. That'll be great! I'll rebase my github branch to master
head. In the meantime, this github branch still works
https://github.com/zli236/postgres/tree/ddl_replication

Regards,
Zheng

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh(dot)rs0541(at)gmail(dot)com, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-04-11 17:31:17
Message-ID:	CAAD30UKvv5=k6BY+JAF1fWzrYNbGcB0DEdNi1FMokULzOwSxcQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Hi Amit,

> Some initial comments:
> ===================
> 1.
> +/*
> + * Write logical decoding DDL message into XLog.
> + */
> +XLogRecPtr
> +LogLogicalDDLMessage(const char *prefix, Oid roleoid, const char *message,
> + size_t size, bool transactional)
>
> I don't see anywhere the patch using a non-transactional message. Is
> it required for any future usage? The non-transactional behavior has
> some known consistency issues, see email [1].

The transactional flag is not required by the current usage. I thought
it might be useful if other logical decoding plugins want to log and
consume DDL messages in a non-transactional way. But I don't have a
specific use case yet.

> 2. For DDL replication, do we need to wait for a consistent point of
> snapshot? For DMLs, that point is a convenient point to initialize
> replication from, which is why we export a snapshot at that point,
> which is used to read normal data. Do we have any similar needs for
> DDL replication?

The current design requires manual schema initialization on the
subscriber before the logical replication setup.
As Euler Taveira pointed out, snapshot is needed in initial schema
synchronization. And that is a different topic.

> 3. The patch seems to be creating an entry in pg_subscription_rel for
> 'create' message. Do we need some handling on Drop, if not, why? I
> think adding some comments for this aspect would make it easier to
> follow.

It's already handled by existing logic in heap_drop_with_catalog:
https://github.com/zli236/postgres/blob/ddl_replication/src/backend/catalog/heap.c#L2005
I'll add some comment.

> 4. The handling related to partition tables seems missing because, on
> the subscriber-side, it always creates a relation entry in
> pg_subscription_rel which won't work. Check its interaction with
> publish_via_partition_root.

I will test it out.

>Even if this works, how will we make Alter Table statement work where
>it needs to rewrite the table? There also I think we can face a
>similar problem if we directly send the statement, once the table will
>be updated due to the DDL statement and then again due to table
>rewrite as that will have a separate WAL.

Yes, I think any DDL that can generate DML changes should be listed
out and handled properly or documented. Here is one extreme example
involving volatile functions:
ALTER TABLE nd_ddl ADD COLUMN t timestamp DEFAULT now().
Again, I think we need to somehow skip the data rewrite on the
subscriber when replicating such DDL and let DML replication handle
the rewrite.

>Another somewhat unrelated problem I see with this work is how to save
>recursion of the same command between nodes (when the involved nodes
>replicate DDLs). For DMLs, we can avoid that via replication origins
>as is being done in the patch proposed [1] but not sure how will we
>deal with that here?

I'll need to investigate "recursion of the same command between
nodes", could you provide an example?

Regards,
Zheng

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	Euler Taveira <euler(at)eulerto(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh(dot)rs0541(at)gmail(dot)com, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-04-11 17:36:42
Message-ID:	CAAD30UKtDOmjMOyXG6bhpKj9eswQ31scPR50Ne37AfUfEhKb8A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

> I'm planning to work on the initial DDL replication. I'll open a new thread as
> soon as I write a design for it. Just as an example, the pglogical approach is
> to use pg_dump behind the scenes to provide the schema [1]. It is a reasonable
> approach but an optimal solution should be an API to provide the initial DDL
> commands. I mean the main point of this feature is to have an API to create an
> object that the logical replication can use it for initial schema
> synchronization. This "DDL to create an object" was already discussed in the
> past [2].

Nice! I think automatic initial schema synchronization for replication
is a very useful feature.

Regards,
Zheng

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Euler Taveira <euler(at)eulerto(dot)com>
Cc:	japin <japinli(at)hotmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh(dot)rs0541(at)gmail(dot)com, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-04-12 10:55:30
Message-ID:	CAA4eK1+w_dFytBiv3RxbOL76_noMzmX0QGTc8uS=bc2WaPVoow@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Mon, Apr 11, 2022 at 6:16 PM Euler Taveira <euler(at)eulerto(dot)com> wrote:
>
> On Mon, Apr 11, 2022, at 2:00 AM, Amit Kapila wrote:
>
> On Thu, Apr 7, 2022 at 3:46 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Wed, Mar 23, 2022 at 10:39 AM Japin Li <japinli(at)hotmail(dot)com> wrote:
> >
> > 2. For DDL replication, do we need to wait for a consistent point of
> > snapshot? For DMLs, that point is a convenient point to initialize
> > replication from, which is why we export a snapshot at that point,
> > which is used to read normal data. Do we have any similar needs for
> > DDL replication?
> >
>
> I have thought a bit more about this and I think we need to build the
> snapshot for DML replication as we need to read catalog tables to
> decode the corresponding WAL but it is not clear to me if we have a
> similar requirement for DDL replication. If the catalog access is
> required then it makes sense to follow the current snapshot model,
> otherwise, we may need to think differently for DDL replication.
>
> One more related point is that for DML replication, we do ensure that
> we copy the entire data of the table (via initial sync) which exists
> even before the publication for that table exists, so do we want to do
> something similar for DDLs? How do we sync the schema of the table
> before the user has defined the publication? Say the table has been
> created before the publication is defined and after that, there are
> only Alter statements, so do we expect, users to create the table on
> the subscriber and then we can replicate the Alter statements? And
> even if we do that it won't be clear which Alter statements will be
> replicated after publication is defined especially if those Alters
> happened concurrently with defining publications?
>
> The *initial* DDL replication is a different problem than DDL replication. The
> former requires a snapshot to read the current catalog data and build a CREATE
> command as part of the subscription process. The subsequent DDLs in that object
> will be handled by a different approach that is being discussed here.
>

I think they are not completely independent because of the current way
to do initial sync followed by replication. The initial sync and
replication need some mechanism to ensure that one of those doesn't
overwrite the work done by the other. Now, the initial idea and patch
can be developed separately but I think both the patches have some
dependency.

--
With Regards,
Amit Kapila.

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh(dot)rs0541(at)gmail(dot)com, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-04-12 11:12:29
Message-ID:	CAA4eK1JzCv2zHtRBvBt9vk88SBv5PavuJCnL_UQUrN78rgvVhQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Mon, Apr 11, 2022 at 11:01 PM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
>
> >Even if this works, how will we make Alter Table statement work where
> >it needs to rewrite the table? There also I think we can face a
> >similar problem if we directly send the statement, once the table will
> >be updated due to the DDL statement and then again due to table
> >rewrite as that will have a separate WAL.
>
> Yes, I think any DDL that can generate DML changes should be listed
> out and handled properly or documented. Here is one extreme example
> involving volatile functions:
> ALTER TABLE nd_ddl ADD COLUMN t timestamp DEFAULT now().
> Again, I think we need to somehow skip the data rewrite on the
> subscriber when replicating such DDL and let DML replication handle
> the rewrite.
>

I am not sure what is the right way to deal with this but another idea
we can investigate is to probably rewrite the DDL such that it doesn't
rewrite the table.

> >Another somewhat unrelated problem I see with this work is how to save
> >recursion of the same command between nodes (when the involved nodes
> >replicate DDLs). For DMLs, we can avoid that via replication origins
> >as is being done in the patch proposed [1] but not sure how will we
> >deal with that here?
>
> I'll need to investigate "recursion of the same command between
> nodes", could you provide an example?
>

See email [1]. I think the same solution should work for your proposal
as well because I see that LogLogicalMessage includes origin but it is
better if you can confirm it.

[1] - https://www.postgresql.org/message-id/flat/CALDaNm0gwjY_4HFxvvty01BOT01q_fJLKQ3pWP9=9orqubhjcQ(at)mail(dot)gmail(dot)com

--
With Regards,
Amit Kapila.

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	Ajin Cherian <itsajin(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-04-13 00:19:23
Message-ID:	CAAD30ULD_igm8RVy75EJrMYYj4HASYOf4UW_0zjLV-sqBqcbJw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Hi,

Here is the rebased new branch
https://github.com/zli236/postgres/commits/ddl_replication

Regards,
Zheng

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-04-13 03:47:59
Message-ID:	CAA4eK1JW3ZiF6w6M57EBQxYvsNZda7k-P0pCD4QLk31KbU2fOw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Wed, Apr 13, 2022 at 5:49 AM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
>
> Hi,
>
> Here is the rebased new branch
> https://github.com/zli236/postgres/commits/ddl_replication
>

Thanks, but it would be good if you can share it in the patch form as
well unless you need to send patches too frequently. It is easier to
give comments.

--
With Regards,
Amit Kapila.

From:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Euler Taveira <euler(at)eulerto(dot)com>, japin <japinli(at)hotmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, rajesh(dot)rs0541(at)gmail(dot)com, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-04-13 09:07:54
Message-ID:	CAFiTN-vKm8JpX25VCMTqG7V_qTJ04nugCDkSob5uWfvk4nQqjQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Tue, Apr 12, 2022 at 4:25 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> > The *initial* DDL replication is a different problem than DDL replication. The
> > former requires a snapshot to read the current catalog data and build a CREATE
> > command as part of the subscription process. The subsequent DDLs in that object
> > will be handled by a different approach that is being discussed here.
> >
>
> I think they are not completely independent because of the current way
> to do initial sync followed by replication. The initial sync and
> replication need some mechanism to ensure that one of those doesn't
> overwrite the work done by the other. Now, the initial idea and patch
> can be developed separately but I think both the patches have some
> dependency.

I agree with the point that their design can not be completely
independent. They have some logical relationship of what schema will
be copied by the initial sync and where is the exact boundary from
which we will start sending as replication. And suppose first we only
plan to implement the replication part then how the user will know
what all schema user has to create and what will be replicated using
DDL replication? Suppose the user takes a dump and copies all the
schema and then creates the subscription, then how we are we going to
handle the DDL concurrent to the subscription command?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc:	Euler Taveira <euler(at)eulerto(dot)com>, japin <japinli(at)hotmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, rajesh(dot)rs0541(at)gmail(dot)com, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-04-13 09:50:20
Message-ID:	CAA4eK1+5EmorT95tJmEGbYrKL0B24v65eu3r6BV_PHmxqdJ4Tw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Wed, Apr 13, 2022 at 2:38 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Tue, Apr 12, 2022 at 4:25 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > > The *initial* DDL replication is a different problem than DDL replication. The
> > > former requires a snapshot to read the current catalog data and build a CREATE
> > > command as part of the subscription process. The subsequent DDLs in that object
> > > will be handled by a different approach that is being discussed here.
> > >
> >
> > I think they are not completely independent because of the current way
> > to do initial sync followed by replication. The initial sync and
> > replication need some mechanism to ensure that one of those doesn't
> > overwrite the work done by the other. Now, the initial idea and patch
> > can be developed separately but I think both the patches have some
> > dependency.
>
> I agree with the point that their design can not be completely
> independent. They have some logical relationship of what schema will
> be copied by the initial sync and where is the exact boundary from
> which we will start sending as replication. And suppose first we only
> plan to implement the replication part then how the user will know
> what all schema user has to create and what will be replicated using
> DDL replication? Suppose the user takes a dump and copies all the
> schema and then creates the subscription, then how we are we going to
> handle the DDL concurrent to the subscription command?
>

Right, I also don't see how it can be done in the current
implementation. So, I think even if we want to develop these two as
separate patches they need to be integrated to make the solution
complete.

--
With Regards,
Amit Kapila.

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, japin <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, rajesh(dot)rs0541(at)gmail(dot)com, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-04-13 17:22:46
Message-ID:	CAAD30UKJJXSMCb0Xk9ZtS1si4KzbasmYzbJjD8rZxTX2HWnGZA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

>Thanks, but it would be good if you can share it in the patch form as
>well unless you need to send patches too frequently. It is easier to
>give comments.

The rebased patches are attached. They apply on commit ed0fbc8.

> > I agree with the point that their design can not be completely
> > independent. They have some logical relationship of what schema will
> > be copied by the initial sync and where is the exact boundary from
> > which we will start sending as replication. And suppose first we only
> > plan to implement the replication part then how the user will know
> > what all schema user has to create and what will be replicated using
> > DDL replication? Suppose the user takes a dump and copies all the
> > schema and then creates the subscription, then how we are we going to
> > handle the DDL concurrent to the subscription command?
> >
>
> Right, I also don't see how it can be done in the current
> implementation. So, I think even if we want to develop these two as
> separate patches they need to be integrated to make the solution
> complete.

The concurrent DDL issue actually exists today even with DML
replication. i.e. user have to make sure table schema is in sync
before starting logical replication.

Here is a high level idea to implement schema initialization:
1. On source DB: dump the schema and create a logical replication slot
on a snapshot at the same time
2. On target DB: restore the schema
3. On target DB: create subscription by pointing it to the logical
replication slot and the snapshot in step 1, so that the initial sync
is based on the snapshot and concurrent changes are replicated using
the logical replication slot.

Regards,
Zheng

Attachment	Content-Type	Size
v4-0005-Support-replication-of-CREATE-.-AS-.-and-SELECT-.-IN.patch	application/octet-stream	8.0 KB
v4-0001-Define-DDL-replication-levels-via-the-CREATE-PUBLICA.patch	application/octet-stream	65.0 KB
v4-0004-Enable-replication-of-CREATE-MATERIALIZED-VIEW-AS-st.patch	application/octet-stream	4.1 KB
v4-0003-Logical-replication-of-DDL-messages.patch	application/octet-stream	41.2 KB
v4-0002-Support-logical-logging-and-decoding-of-DDL-command-.patch	application/octet-stream	61.3 KB
v4-0006-Add-couple-tests-for-DDL-replication-case.patch	application/octet-stream	4.5 KB

From:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rajesh(dot)rs0541(at)gmail(dot)com
Subject:	Re: Support logical replication of DDLs
Date:	2022-04-14 09:26:39
Message-ID:	CAFiTN-vz2V=u43Zkr7Y9Lz0w7G5qsQk=fKnc1x3ZcBO9_f0gbg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, Apr 8, 2022 at 4:30 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Mar 29, 2022 at 9:47 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > >
> > > The idea is to force skipping any direct data population (which can
> > > potentially cause data inconsistency on the subscriber)
> > > in CREATE AS and SELECT INTO command on the subscriber by forcing the
> > > skipData flag in the intoClause of the parsetree after
> > > the logical replication worker parses the command. The data sync will
> > > be taken care of by the DML replication after the DDL replication
> > > finishes.
> >
> > Okay, something like that should work, I am not sure it is the best
> > design though.
> >
>
> Even if this works, how will we make Alter Table statement work where
> it needs to rewrite the table? There also I think we can face a
> similar problem if we directly send the statement, once the table will
> be updated due to the DDL statement and then again due to table
> rewrite as that will have a separate WAL.

I agree. But here the bigger question is what is the correct behavior
in case of the Alter Table? I mean for example in the publisher the
table gets rewritten due to the Access Method change then what should
be the behavior of the subscriber. One expected behavior is that on
subscriber also the access method gets changed and the data remains
the same as on the subscriber table(table can be locally rewritten
based on new AM). Which seems quite sensible behavior to me. But if
we want this behavior then we can not replay the logical messages
generated by DML WAL because of table rewrite, otherwise we will get
duplicate data, unless we plan to get rid of the current data and just
get all new data from the publisher. And if we do that then the data
will be as per the latest data in the table based on the publisher, so
I think first we need to define the correct behavior and then we can
design it accordingly.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

From:	"Euler Taveira" <euler(at)eulerto(dot)com>
To:	"Dilip Kumar" <dilipbalaut(at)gmail(dot)com>, "Amit Kapila" <amit(dot)kapila16(at)gmail(dot)com>
Cc:	"Zheng Li" <zhengli10(at)gmail(dot)com>, "PostgreSQL Hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "rajesh singarapu" <rajesh(dot)rs0541(at)gmail(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-04-14 14:15:20
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Thu, Apr 14, 2022, at 6:26 AM, Dilip Kumar wrote:
> I agree. But here the bigger question is what is the correct behavior
> in case of the Alter Table? I mean for example in the publisher the
> table gets rewritten due to the Access Method change then what should
> be the behavior of the subscriber. One expected behavior is that on
> subscriber also the access method gets changed and the data remains
> the same as on the subscriber table(table can be locally rewritten
> based on new AM). Which seems quite sensible behavior to me. But if
> we want this behavior then we can not replay the logical messages
> generated by DML WAL because of table rewrite, otherwise we will get
> duplicate data, unless we plan to get rid of the current data and just
> get all new data from the publisher. And if we do that then the data
> will be as per the latest data in the table based on the publisher, so
> I think first we need to define the correct behavior and then we can
> design it accordingly.
You should forbid it. Unless you can decompose the command into multiple SQL
commands to make it a safe operation for logical replication.

Let's say you want to add a column with a volatile default.

ALTER TABLE foo ADD COLUMN bar double precision DEFAULT random();

If you replicate the DDL command as is, you will have different data
downstream. You should forbid it. However, this operation can be supported if
the DDL command is decomposed in multiple steps.

-- add a new column without DEFAULT to avoid rewrite
ALTER TABLE foo ADD COLUMN bar double precision;

-- future rows could use the DEFAULT expression
-- it also doesn't rewrite the table
ALTER TABLE foo ALTER COLUMN bar SET DEFAULT random();

-- it effectively rewrites the table
-- all rows are built from one source node
-- data will be the same on all nodes
UPDATE foo SET bar = random();

The ALTER TABLE ... ALTER COLUMN ... TYPE has a similar issue. This DDL command
can be decomposed to avoid the rewrite. If you are changing the data type, in
general, you add a new column and updates all rows doing the proper conversion.
(If you are updating in batches, you usually add a trigger to automatically
adjust the new column value for INSERTs and UPDATEs. Another case is when you
are reducing the the typmod (for example, varchar(100) to varchar(20)). In this
case, the DDL command can de decomposed removing the typmod information (ALTER
TABLE ... ALTER COLUMN ... TYPE varchar) and replacing it with a CHECK
constraint.

I didn't review this patch in depth but we certainly need to impose some DDL
restrictions if we are replicating DDLs. There are other cases that should be
treated accordingly such as a TABLESPACE specification or a custom data type.

--
Euler Taveira
EDB https://www.enterprisedb.com/

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	Euler Taveira <euler(at)eulerto(dot)com>
Cc:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-04-14 20:31:15
Message-ID:	CAAD30ULkkGsFD_rUs1p5ReMuvrSk8vX0W0P17RGa-V3b+7JLFw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

> You should forbid it. Unless you can decompose the command into multiple SQL
> commands to make it a safe operation for logical replication.
>
> Let's say you want to add a column with a volatile default.
>
> ALTER TABLE foo ADD COLUMN bar double precision DEFAULT random();
>
> If you replicate the DDL command as is, you will have different data
> downstream. You should forbid it. However, this operation can be supported if
> the DDL command is decomposed in multiple steps.
>
> -- add a new column without DEFAULT to avoid rewrite
> ALTER TABLE foo ADD COLUMN bar double precision;
>
> -- future rows could use the DEFAULT expression
> -- it also doesn't rewrite the table
> ALTER TABLE foo ALTER COLUMN bar SET DEFAULT random();
>
> -- it effectively rewrites the table
> -- all rows are built from one source node
> -- data will be the same on all nodes
> UPDATE foo SET bar = random();
>
> The ALTER TABLE ... ALTER COLUMN ... TYPE has a similar issue. This DDL command
> can be decomposed to avoid the rewrite. If you are changing the data type, in
> general, you add a new column and updates all rows doing the proper conversion.
> (If you are updating in batches, you usually add a trigger to automatically
> adjust the new column value for INSERTs and UPDATEs. Another case is when you
> are reducing the the typmod (for example, varchar(100) to varchar(20)). In this
> case, the DDL command can be decomposed removing the typmod information (ALTER
> TABLE ... ALTER COLUMN ... TYPE varchar) and replacing it with a CHECK
> constraint.
>
> I didn't review this patch in depth but we certainly need to impose some DDL
> restrictions if we are replicating DDLs. There are other cases that should be
> treated accordingly such as a TABLESPACE specification or a custom data type.

This is helpful. Thanks.

Zheng

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-04-15 02:08:58
Message-ID:	CAAD30UKRGo_VrdFKYOzWLLRnynjU7DC0uw9mb8LWCPFSnOxXAg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

> > But then this could be true for DML as well right? Like after
> > replicating the function to the subscriber if we are sending the DML
> > done by function then what's the problem in DDL. I mean if there is
> > no design issue in implementing this then I don't think there is much
> > point in blocking the same or even providing configuration for this.
>
> Agreed. I'll unblock DDLs in functions/procedures and test it out. I
> find out some DDLs in functions are replicated multiple times on the
> subscriber while they should only be replicated once. Still trying to
> figure out why.

Here is the patch unblocking DDLs in function and procedures. Also
fixed a bug where DDL with sub-command may get logged twice.
github commit of the same patch:
https://github.com/zli236/postgres/commit/d0fe6065ee3cb6051e5b026f17b82f5220903e6f

Regards,
Zheng

Attachment	Content-Type	Size
v4-0007-Enable-logging-and-replication-of-DDLs-in-function.patch	application/octet-stream	17.5 KB

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	Euler Taveira <euler(at)eulerto(dot)com>
Cc:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-04-29 17:34:32
Message-ID:	CAAD30ULUBpNegQGEx4_CO-eoz4Z1eYmfOKOTPC7ajmT5_n1mVQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Hello,

I looked more into this. In order to support statements like "ALTER
TABLE foo ADD COLUMN bar double precision DEFAULT random();", we have
two potential solutions, but both of them are non-trivial to
implement:

1. As Euler pointed out, we could decompose the statement on the
publisher into multiple statements so that the table rewrite (using
volatile function) is handled by a DML sub-command. The decomposition
requires changes in parse analysis/transform. We also need new logic
to assemble the decomposed DDL commands string back from the parse
trees so we can log them for logical replication.

2. Force skipping table rewrite when executing the same command on the
subscriber, and let DML replication replicate the table rewrite from
the publisher. The problem is table rewrite is not replicated at all
today, and it doesn't seem easy to just enable it for logical
replication. Table rewrite is an expensive operation involving heap
file swap, details can be found in ATRewriteTables().

In light of this, I propose to temporarily block replication of such
DDL command on the replication worker until we figure out a better
solution. This is implemented in patch
0008-Fail-replication-worker-on-DDL-command-that-rewrites.patch.
Notice only DDL statements that rewrite table using a VOLATILE
expression will be blocked. I don't see a problem replicating
non-volatile expression.
Here is the github commit of the same patch:
https://github.com/zli236/postgres/commit/1e6115cb99a1286a61cb0a6a088f7476da29d0b9

> The ALTER TABLE ... ALTER COLUMN ... TYPE has a similar issue. This DDL command
> can be decomposed to avoid the rewrite. If you are changing the data type, in
> general, you add a new column and updates all rows doing the proper conversion.
> (If you are updating in batches, you usually add a trigger to automatically
> adjust the new column value for INSERTs and UPDATEs. Another case is when you
> are reducing the the typmod (for example, varchar(100) to varchar(20)). In this
> case, the DDL command can de decomposed removing the typmod information (ALTER
> TABLE ... ALTER COLUMN ... TYPE varchar) and replacing it with a CHECK
> constraint.

I tested ALTER TABLE ... ALTER COLUMN ... TYPE. It seems to be working
fine. Is there a particular case you're concerned about?

> --
> Euler Taveira
> EDB https://www.enterprisedb.com/
>

Regards,
Zheng Li

Attachment	Content-Type	Size
0008-Fail-replication-worker-on-DDL-command-that-rewrites.patch	application/octet-stream	4.8 KB

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-05-03 21:30:20
Message-ID:	CAAD30UL5o=yiPDB6A5RUCFy0H8Oem9E2oH7jqS1O25Yv4V3FxA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

> > >Another somewhat unrelated problem I see with this work is how to save
> > >recursion of the same command between nodes (when the involved nodes
> > >replicate DDLs). For DMLs, we can avoid that via replication origins
> > >as is being done in the patch proposed [1] but not sure how will we
> > >deal with that here?
> >
> > I'll need to investigate "recursion of the same command between
> > nodes", could you provide an example?
> >
>
> See email [1]. I think the same solution should work for your proposal
> as well because I see that LogLogicalMessage includes origin but it is
> better if you can confirm it.
>
> [1] - https://www.postgresql.org/message-id/flat/CALDaNm0gwjY_4HFxvvty01BOT01q_fJLKQ3pWP9=9orqubhjcQ(at)mail(dot)gmail(dot)com

Yes, I applied the patches from [1] and can confirm that the solution
for "infinite recursion of the same command(DDL command in this case)
between nodes" works with my patches.

Regards,
Zheng

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rajesh(dot)rs0541(at)gmail(dot)com, Ajin Cherian <itsajin(at)gmail(dot)com>, Hou, Zhijie/侯志杰 <houzj(dot)fnst(at)fujitsu(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-05-06 13:24:14
Message-ID:	CAA4eK1+6A72+3HNrbg12wGaAy=rYPDS+AFkbePtSj=h++1ir8Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, Apr 8, 2022 at 5:04 PM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>
> On 2022-Apr-08, Amit Kapila wrote:
>
> > On Thu, Mar 17, 2022 at 3:36 AM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>
> > For runtime conditions, one of the things you have mentioned in that
> > thread is to add schema name in the statement at the required places
> > which this patch deals with in a different way by explicitly sending
> > it along with the DDL statement.
>
> Hmm, ok. The point of the JSON-blob route is that the publisher sends a
> command representation that can be parsed/processed/transformed
> arbitrarily by the subscriber using generic rules; it should be trivial
> to use a JSON tool to change schema A to schema B in any arbitrary DDL
> command, and produce another working DDL command without having to know
> how to write that command specifically.
>

Attached is a set of two patches as an attempt to evaluate this approach.

The first patch provides functions to deparse DDL commands. Currently,
it is restricted to just a simple CREATE TABLE statement, the required
code is extracted from one of the patches posted in the thread [1].

The second patch allows replicating simple CREATE TABLE DDL
replication. To do that we used an event trigger and DDL deparsing
facilities. While creating a publication, we register a command end
trigger that deparses the DDL as a JSON blob, and WAL logs it. The
event trigger is automatically removed at the time of drop
publication. The WALSender decodes the WAL and sends it downstream
similar to other DML commands. The subscriber then converts JSON back
to the DDL command string and executes it. In the subscriber, we also
add the newly added rel to pg_subscription_rel so that the DML changes
on the new table can be replicated without having to manually run
"ALTER SUBSCRIPTION ... REFRESH PUBLICATION". Some of the code related
to WAL logging and subscriber-side work is taken from the patch posted
by Zheng in this thread but there are quite a few changes in that as
we don't need schema, role, transaction vs. non-transactional
handling.

Note that for now, we have hacked Create Publication code such that
when the user specifies the "FOR ALL TABLES" clause, we invoke this
new functionality. So, this will work only for "FOR ALL TABLES"
publications. For example, we need to below to replicate the simple
Create Table command.

Publisher:
Create Publication pub1 For All Tables;

Subscriber:
Create Subscription sub1 Connection '...' Publication pub1;

Publisher:
Create Table t1(c1 int);

Subscriber:
\d should show t1.

As we have hacked CreatePublication function for this POC, the
regression tests are not passing but we can easily change it so that
we invoke new functionality with the syntax proposed in this thread or
with some other syntax and we shall do that in the next patch unless
this approach is not worth pursuing.

This POC is prepared by Ajin Cherian, Hou-San, and me.

Thoughts?

[1] - https://www.postgresql.org/message-id/20150215044814.GL3391%40alvh.no-ip.org

--
With Regards,
Amit Kapila.

Attachment	Content-Type	Size
v1-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	62.6 KB
v1-0002-Support-DDL-replication.patch	application/octet-stream	47.8 KB

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Hou, Zhijie/侯志杰 <houzj(dot)fnst(at)fujitsu(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-05-06 16:51:21
Message-ID:	CAAD30UKWX1QAv_tnxZa3--bPGyoa_ifS5x64ESRC5Yga9je1vA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

> Attached is a set of two patches as an attempt to evaluate this approach.
>
> The first patch provides functions to deparse DDL commands. Currently,
> it is restricted to just a simple CREATE TABLE statement, the required
> code is extracted from one of the patches posted in the thread [1].
>
> The second patch allows replicating simple CREATE TABLE DDL
> replication. To do that we used an event trigger and DDL deparsing
> facilities. While creating a publication, we register a command end
> trigger that deparses the DDL as a JSON blob, and WAL logs it. The
> event trigger is automatically removed at the time of drop
> publication. The WALSender decodes the WAL and sends it downstream
> similar to other DML commands. The subscriber then converts JSON back
> to the DDL command string and executes it. In the subscriber, we also
> add the newly added rel to pg_subscription_rel so that the DML changes
> on the new table can be replicated without having to manually run
> "ALTER SUBSCRIPTION ... REFRESH PUBLICATION". Some of the code related
> to WAL logging and subscriber-side work is taken from the patch posted
> by Zheng in this thread but there are quite a few changes in that as
> we don't need schema, role, transaction vs. non-transactional
> handling.
>
> Note that for now, we have hacked Create Publication code such that
> when the user specifies the "FOR ALL TABLES" clause, we invoke this
> new functionality. So, this will work only for "FOR ALL TABLES"
> publications. For example, we need to below to replicate the simple
> Create Table command.
>
> Publisher:
> Create Publication pub1 For All Tables;
>
> Subscriber:
> Create Subscription sub1 Connection '...' Publication pub1;
>
> Publisher:
> Create Table t1(c1 int);
>
> Subscriber:
> \d should show t1.
>
> As we have hacked CreatePublication function for this POC, the
> regression tests are not passing but we can easily change it so that
> we invoke new functionality with the syntax proposed in this thread or
> with some other syntax and we shall do that in the next patch unless
> this approach is not worth pursuing.
>
> This POC is prepared by Ajin Cherian, Hou-San, and me.
>
> Thoughts?
>
> [1] - https://www.postgresql.org/message-id/20150215044814.GL3391%40alvh.no-ip.org

Hi,

Thanks for exploring this direction.

I read the deparsing thread and your patch. Here is my thought:
1. The main concern on maintainability of the deparsing code still
applies if we want to adapt it for DDL replication.
2. The search_path and role still need to be handled, in the deparsing
code. And I think it's actually more overhead to qualify every object
compared to just logging the search_path and enforcing it on the apply
worker.
3. I'm trying to understand if deparsing helps with edge cases like
"ALTER TABLE foo ADD COLUMN bar double precision DEFAULT random();". I
don't think it helps out of the box. The crux of separating table
rewrite and DDL still needs to be solved for such cases.

Regards,
Zheng

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Hou, Zhijie/侯志杰 <houzj(dot)fnst(at)fujitsu(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-05-07 04:08:43
Message-ID:	CAA4eK1K_HGzrA6gCHR43j0rnQSLBYOW2NNcdwXx27UiaAckiMg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, May 6, 2022 at 10:21 PM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
>
> > Attached is a set of two patches as an attempt to evaluate this approach.
> >
>
> Thanks for exploring this direction.
>
> I read the deparsing thread and your patch. Here is my thought:
> 1. The main concern on maintainability of the deparsing code still
> applies if we want to adapt it for DDL replication.
>

I agree that it adds to our maintainability effort, like every time we
enhance any DDL or add a new DDL that needs to be replicated, we
probably need to change the deparsing code. But OTOH this approach
seems to provide better flexibility. So, in the long run, maybe the
effort is worthwhile. I am not completely sure at this stage which
approach is better but I thought it is worth evaluating this approach
as Alvaro and Robert seem to prefer this idea.

> 2. The search_path and role still need to be handled, in the deparsing
> code. And I think it's actually more overhead to qualify every object
> compared to just logging the search_path and enforcing it on the apply
> worker.
>

But doing it in the deparsing code will have the benefit that the
other plugins won't have to develop similar logic.

--
With Regards,
Amit Kapila.

From:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Hou, Zhijie/侯志杰 <houzj(dot)fnst(at)fujitsu(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-05-08 07:09:27
Message-ID:	CAFiTN-sFwwk963MkeUipVn3Me8JdpH43Kd=nNbJMnv__jwXPcw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Sat, May 7, 2022 at 9:38 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, May 6, 2022 at 10:21 PM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
> >
> > > Attached is a set of two patches as an attempt to evaluate this approach.
> > >
> >
> > Thanks for exploring this direction.
> >
> > I read the deparsing thread and your patch. Here is my thought:
> > 1. The main concern on maintainability of the deparsing code still
> > applies if we want to adapt it for DDL replication.
> >
>
> I agree that it adds to our maintainability effort, like every time we
> enhance any DDL or add a new DDL that needs to be replicated, we
> probably need to change the deparsing code. But OTOH this approach
> seems to provide better flexibility. So, in the long run, maybe the
> effort is worthwhile. I am not completely sure at this stage which
> approach is better but I thought it is worth evaluating this approach
> as Alvaro and Robert seem to prefer this idea.

+1, IMHO with deparsing logic it would be easy to handle the mixed DDL
commands like ALTER TABLE REWRITE. But the only thing is that we will
have to write the deparsing code for all the utility commands so there
will be a huge amount of new code to maintain.

> > 2. The search_path and role still need to be handled, in the deparsing
> > code. And I think it's actually more overhead to qualify every object
> > compared to just logging the search_path and enforcing it on the apply
> > worker.
> >
>
> But doing it in the deparsing code will have the benefit that the
> other plugins won't have to develop similar logic.

Right, this makes sense.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Hou, Zhijie/侯志杰 <houzj(dot)fnst(at)fujitsu(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-05-09 07:16:09
Message-ID:	CALj2ACXFnvxm-AF4kMN_ofQKN9zFNuvrYqO8fbBTN3hYagxxtA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Sun, May 8, 2022 at 12:39 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Sat, May 7, 2022 at 9:38 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Fri, May 6, 2022 at 10:21 PM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
> > >
> > > > Attached is a set of two patches as an attempt to evaluate this approach.
> > > >
> > >
> > > Thanks for exploring this direction.
> > >
> > > I read the deparsing thread and your patch. Here is my thought:
> > > 1. The main concern on maintainability of the deparsing code still
> > > applies if we want to adapt it for DDL replication.
> > >
> >
> > I agree that it adds to our maintainability effort, like every time we
> > enhance any DDL or add a new DDL that needs to be replicated, we
> > probably need to change the deparsing code. But OTOH this approach
> > seems to provide better flexibility. So, in the long run, maybe the
> > effort is worthwhile. I am not completely sure at this stage which
> > approach is better but I thought it is worth evaluating this approach
> > as Alvaro and Robert seem to prefer this idea.
>
> +1, IMHO with deparsing logic it would be easy to handle the mixed DDL
> commands like ALTER TABLE REWRITE. But the only thing is that we will
> have to write the deparsing code for all the utility commands so there
> will be a huge amount of new code to maintain.

I haven't gone through the entire thread, just trying to understand
the need of deparsing logic - do we need the DDL query text to be sent
to subscribers? If yes, why can't we WAL log the DDL query text as-is
and the logical decoding can send those to subscribers, as-is or in a
modified form? Or log the DDLs to a separate log on disk?

Regards,
Bharath Rupireddy.

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Hou, Zhijie/侯志杰 <houzj(dot)fnst(at)fujitsu(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-05-09 08:56:03
Message-ID:	CAA4eK1JHmWsu4cN074=SPQqp8iKQuwYyJGjQKwz5OvTVJ0-6Tw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Mon, May 9, 2022 at 12:46 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Sun, May 8, 2022 at 12:39 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Sat, May 7, 2022 at 9:38 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Fri, May 6, 2022 at 10:21 PM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
> > > >
> > > > > Attached is a set of two patches as an attempt to evaluate this approach.
> > > > >
> > > >
> > > > Thanks for exploring this direction.
> > > >
> > > > I read the deparsing thread and your patch. Here is my thought:
> > > > 1. The main concern on maintainability of the deparsing code still
> > > > applies if we want to adapt it for DDL replication.
> > > >
> > >
> > > I agree that it adds to our maintainability effort, like every time we
> > > enhance any DDL or add a new DDL that needs to be replicated, we
> > > probably need to change the deparsing code. But OTOH this approach
> > > seems to provide better flexibility. So, in the long run, maybe the
> > > effort is worthwhile. I am not completely sure at this stage which
> > > approach is better but I thought it is worth evaluating this approach
> > > as Alvaro and Robert seem to prefer this idea.
> >
> > +1, IMHO with deparsing logic it would be easy to handle the mixed DDL
> > commands like ALTER TABLE REWRITE. But the only thing is that we will
> > have to write the deparsing code for all the utility commands so there
> > will be a huge amount of new code to maintain.
>
> I haven't gone through the entire thread, just trying to understand
> the need of deparsing logic
>

That provided more flexibility, for example, see [1]. We can do things
like identifying the right schema for an object in a way that all
plugins don't need to do specific handling. I think the point Dilip is
highlighting about table rewrite is explained in the email [2]. The
point is that sometimes we can't replay the DDL statement as it is on
the subscriber and deparsing might help in some of those cases even if
not all. At this stage, we are just evaluating both approaches.

[1] - https://www.postgresql.org/message-id/202204081134.6tcmf5cxl3sz%40alvherre.pgsql
[2] - https://www.postgresql.org/message-id/3c646317-df34-4cb3-9365-14abeada6587%40www.fastmail.com

--
With Regards,
Amit Kapila.

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Hou, Zhijie/侯志杰 <houzj(dot)fnst(at)fujitsu(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-05-09 09:05:12
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On 2022-May-08, Dilip Kumar wrote:

> On Sat, May 7, 2022 at 9:38 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:

> > I agree that it adds to our maintainability effort, like every time we
> > enhance any DDL or add a new DDL that needs to be replicated, we
> > probably need to change the deparsing code. But OTOH this approach
> > seems to provide better flexibility. So, in the long run, maybe the
> > effort is worthwhile. I am not completely sure at this stage which
> > approach is better but I thought it is worth evaluating this approach
> > as Alvaro and Robert seem to prefer this idea.
>
> +1, IMHO with deparsing logic it would be easy to handle the mixed DDL
> commands like ALTER TABLE REWRITE. But the only thing is that we will
> have to write the deparsing code for all the utility commands so there
> will be a huge amount of new code to maintain.

Actually, the largest stumbling block on this, IMO, is having a good
mechanism to verify that all DDLs are supported. The deparsing code
itself is typically not very bad, as long as we have a sure way to twist
every contributor's hand into writing it, which is what an automated
test mechanism would give us.

The code in test_ddl_deparse is a pretty lame start, not nearly good
enough by a thousand miles. My real intention was to have a test
harness that would first run a special SQL script to install DDL
capture, then run all the regular src/test/regress scripts, and then at
the end ensure that all the DDL scripts were properly reproduced -- for
example transmit them to another database, replay them there, and dump
both databases and compare them. However, there were challenges which I
no longer remember and we were unable to complete this, and we are where
we are.

Thanks for rebasing that old code, BTW.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, japin <japinli(at)hotmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, rajesh(dot)rs0541(at)gmail(dot)com, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-05-10 07:02:05
Message-ID:	CAD21AoDThyJBFfjjLVFnfMR5L3BfPTWYrotaSJq-=OE3P0GVeA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Wed, Apr 13, 2022 at 6:50 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Apr 13, 2022 at 2:38 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Tue, Apr 12, 2022 at 4:25 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > > The *initial* DDL replication is a different problem than DDL replication. The
> > > > former requires a snapshot to read the current catalog data and build a CREATE
> > > > command as part of the subscription process. The subsequent DDLs in that object
> > > > will be handled by a different approach that is being discussed here.
> > > >
> > >
> > > I think they are not completely independent because of the current way
> > > to do initial sync followed by replication. The initial sync and
> > > replication need some mechanism to ensure that one of those doesn't
> > > overwrite the work done by the other. Now, the initial idea and patch
> > > can be developed separately but I think both the patches have some
> > > dependency.
> >
> > I agree with the point that their design can not be completely
> > independent. They have some logical relationship of what schema will
> > be copied by the initial sync and where is the exact boundary from
> > which we will start sending as replication. And suppose first we only
> > plan to implement the replication part then how the user will know
> > what all schema user has to create and what will be replicated using
> > DDL replication? Suppose the user takes a dump and copies all the
> > schema and then creates the subscription, then how we are we going to
> > handle the DDL concurrent to the subscription command?
> >
>
> Right, I also don't see how it can be done in the current
> implementation. So, I think even if we want to develop these two as
> separate patches they need to be integrated to make the solution
> complete.

It would be better to develop them separately in terms of development
speed but, yes, we perhaps need to integrate them at some points.

I think that the initial DDL replication can be done when the
relation's state is SUBREL_STATE_INIT. That is, at the very beginning
of the table synchronization, the syncworker copies the table schema
somehow, then starts the initial data copy. After that, syncworker or
applyworker applies DML/DDL changes while catching up and streaming
changes, respectively. Probably we can have it optional whether to
copy schema only, data only, or both.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, japin <japinli(at)hotmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-05-10 09:27:28
Message-ID:	CAA4eK1+3YkeSZeZAeB7no0fSQHSWmM_wZeJPZvbLCTE-mAUY=Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Tue, May 10, 2022 at 12:32 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Wed, Apr 13, 2022 at 6:50 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Wed, Apr 13, 2022 at 2:38 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > >
> > > On Tue, Apr 12, 2022 at 4:25 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > >
> > > > > The *initial* DDL replication is a different problem than DDL replication. The
> > > > > former requires a snapshot to read the current catalog data and build a CREATE
> > > > > command as part of the subscription process. The subsequent DDLs in that object
> > > > > will be handled by a different approach that is being discussed here.
> > > > >
> > > >
> > > > I think they are not completely independent because of the current way
> > > > to do initial sync followed by replication. The initial sync and
> > > > replication need some mechanism to ensure that one of those doesn't
> > > > overwrite the work done by the other. Now, the initial idea and patch
> > > > can be developed separately but I think both the patches have some
> > > > dependency.
> > >
> > > I agree with the point that their design can not be completely
> > > independent. They have some logical relationship of what schema will
> > > be copied by the initial sync and where is the exact boundary from
> > > which we will start sending as replication. And suppose first we only
> > > plan to implement the replication part then how the user will know
> > > what all schema user has to create and what will be replicated using
> > > DDL replication? Suppose the user takes a dump and copies all the
> > > schema and then creates the subscription, then how we are we going to
> > > handle the DDL concurrent to the subscription command?
> > >
> >
> > Right, I also don't see how it can be done in the current
> > implementation. So, I think even if we want to develop these two as
> > separate patches they need to be integrated to make the solution
> > complete.
>
> It would be better to develop them separately in terms of development
> speed but, yes, we perhaps need to integrate them at some points.
>
> I think that the initial DDL replication can be done when the
> relation's state is SUBREL_STATE_INIT. That is, at the very beginning
> of the table synchronization, the syncworker copies the table schema
> somehow, then starts the initial data copy. After that, syncworker or
> applyworker applies DML/DDL changes while catching up and streaming
> changes, respectively. Probably we can have it optional whether to
> copy schema only, data only, or both.
>

This sounds okay for copying table schema but we can have other
objects like functions, procedures, views, etc. So, we may need
altogether a separate mechanism to copy all the published objects.

--
With Regards,
Amit Kapila.

From:	Ajin Cherian <itsajin(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rajesh(dot)rs0541(at)gmail(dot)com, Hou, Zhijie/侯志杰 <houzj(dot)fnst(at)fujitsu(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-05-10 11:33:49
Message-ID:	CAFPTHDbEG-TCAAcUKQTP+hM2rFOCVn4RbkqHm4=9ua-ZY_rRNw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, May 6, 2022 at 11:24 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> As we have hacked CreatePublication function for this POC, the
> regression tests are not passing but we can easily change it so that
> we invoke new functionality with the syntax proposed in this thread or
> with some other syntax and we shall do that in the next patch unless
> this approach is not worth pursuing.
>
> This POC is prepared by Ajin Cherian, Hou-San, and me.
>
> Thoughts?
>
> [1] - https://www.postgresql.org/message-id/20150215044814.GL3391%40alvh.no-ip.org

I have updated Amit's patch by including a public action "create" when
creating publication which is turned off by default.
Now the 'make check' tests pass. I also fixed a problem that failed to
create tables when the table has a primary key.

regards,
Ajin Cherian
Fujitsu Australia

Attachment	Content-Type	Size
v2-0001-Fix-race-in-032_relfilenode_reuse.pl.patch	application/octet-stream	1.7 KB
v2-0002-Functions-to-deparse-DDL-commands.patch	application/octet-stream	63.9 KB

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Ajin Cherian <itsajin(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, Hou, Zhijie/侯志杰 <houzj(dot)fnst(at)fujitsu(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-05-10 12:39:37
Message-ID:	CAA4eK1Li15CyyH=hJhQbSdpaeA5yyt4WCU8S0SWiGxctLyefLg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Tue, May 10, 2022 at 4:59 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
> On Fri, May 6, 2022 at 11:24 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > As we have hacked CreatePublication function for this POC, the
> > regression tests are not passing but we can easily change it so that
> > we invoke new functionality with the syntax proposed in this thread or
> > with some other syntax and we shall do that in the next patch unless
> > this approach is not worth pursuing.
> >
> > This POC is prepared by Ajin Cherian, Hou-San, and me.
> >
> > Thoughts?
> >
> > [1] - https://www.postgresql.org/message-id/20150215044814.GL3391%40alvh.no-ip.org
>
> I have updated Amit's patch by including a public action "create" when
> creating publication which is turned off by default.
> Now the 'make check' tests pass. I also fixed a problem that failed to
> create tables when the table has a primary key.
>

Are these the correct set of patches? I don't see a DDL support patch.

--
With Regards,
Amit Kapila.

From:	Ajin Cherian <itsajin(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rajesh(dot)rs0541(at)gmail(dot)com, Hou, Zhijie/侯志杰 <houzj(dot)fnst(at)fujitsu(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-05-10 15:17:27
Message-ID:	CAFPTHDYza7PZqKdkyoRUpJTDC26x_XZtJnFwtiyy9C_HuUS7OA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Tue, May 10, 2022 at 9:33 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
> On Fri, May 6, 2022 at 11:24 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > As we have hacked CreatePublication function for this POC, the
> > regression tests are not passing but we can easily change it so that
> > we invoke new functionality with the syntax proposed in this thread or
> > with some other syntax and we shall do that in the next patch unless
> > this approach is not worth pursuing.
> >
> > This POC is prepared by Ajin Cherian, Hou-San, and me.
> >
> > Thoughts?
> >
> > [1] - https://www.postgresql.org/message-id/20150215044814.GL3391%40alvh.no-ip.org
>
> I have updated Amit's patch by including a public action "create" when
> creating publication which is turned off by default.
> Now the 'make check' tests pass. I also fixed a problem that failed to
> create tables when the table has a primary key.

Sorry, I sent the wrong patches. Here is the correct one.

regards,
Ajin Cherian
Fujitsu Australia

Attachment	Content-Type	Size
v2-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	62.6 KB
v2-0002-Support-DDL-replication.patch	application/octet-stream	104.5 KB

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Hou, Zhijie/侯志杰 <houzj(dot)fnst(at)fujitsu(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-05-10 15:48:01
Message-ID:	CAAD30UKy9GQ8o0VTB7H1ymHzj85NgpTjF1xR3rTd65rOsufGZA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

> > > I agree that it adds to our maintainability effort, like every time we
> > > enhance any DDL or add a new DDL that needs to be replicated, we
> > > probably need to change the deparsing code. But OTOH this approach
> > > seems to provide better flexibility. So, in the long run, maybe the
> > > effort is worthwhile. I am not completely sure at this stage which
> > > approach is better but I thought it is worth evaluating this approach
> > > as Alvaro and Robert seem to prefer this idea.
> >
> > +1, IMHO with deparsing logic it would be easy to handle the mixed DDL
> > commands like ALTER TABLE REWRITE. But the only thing is that we will
> > have to write the deparsing code for all the utility commands so there
> > will be a huge amount of new code to maintain.
>
> Actually, the largest stumbling block on this, IMO, is having a good
> mechanism to verify that all DDLs are supported. The deparsing code
> itself is typically not very bad, as long as we have a sure way to twist
> every contributor's hand into writing it, which is what an automated
> test mechanism would give us.
>
> The code in test_ddl_deparse is a pretty lame start, not nearly good
> enough by a thousand miles. My real intention was to have a test
> harness that would first run a special SQL script to install DDL
> capture, then run all the regular src/test/regress scripts, and then at
> the end ensure that all the DDL scripts were properly reproduced -- for
> example transmit them to another database, replay them there, and dump
> both databases and compare them. However, there were challenges which I
> no longer remember and we were unable to complete this, and we are where
> we are.
>
> Thanks for rebasing that old code, BTW.

I agree that deparsing could be a very useful utility on its own. Not
only for SQL command replication between PostgreSQL servers, but also
potentially feasible for SQL command replication between PotgreSQL and
other database systems. For example, one could assemble the json
representation of the SQL parse tree back to a SQL command that can be
run in MySQL. But that requires different assembling rules and code
for different database systems. If we're envisioning this kind of
flexibility that the deparsing utility can offer, then I think it's
better to develop the deparsing utility as an extension itself.

Regards,
Zheng

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Hou, Zhijie/侯志杰 <houzj(dot)fnst(at)fujitsu(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-05-11 04:45:24
Message-ID:	CAD21AoDDwPoFRa=41vuR7kj_DMbwxqyCtv5+ZFvsmGw-R1EXLA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Wed, May 11, 2022 at 1:01 AM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
>
> > > > I agree that it adds to our maintainability effort, like every time we
> > > > enhance any DDL or add a new DDL that needs to be replicated, we
> > > > probably need to change the deparsing code. But OTOH this approach
> > > > seems to provide better flexibility. So, in the long run, maybe the
> > > > effort is worthwhile. I am not completely sure at this stage which
> > > > approach is better but I thought it is worth evaluating this approach
> > > > as Alvaro and Robert seem to prefer this idea.
> > >
> > > +1, IMHO with deparsing logic it would be easy to handle the mixed DDL
> > > commands like ALTER TABLE REWRITE. But the only thing is that we will
> > > have to write the deparsing code for all the utility commands so there
> > > will be a huge amount of new code to maintain.
> >
> > Actually, the largest stumbling block on this, IMO, is having a good
> > mechanism to verify that all DDLs are supported. The deparsing code
> > itself is typically not very bad, as long as we have a sure way to twist
> > every contributor's hand into writing it, which is what an automated
> > test mechanism would give us.
> >
> > The code in test_ddl_deparse is a pretty lame start, not nearly good
> > enough by a thousand miles. My real intention was to have a test
> > harness that would first run a special SQL script to install DDL
> > capture, then run all the regular src/test/regress scripts, and then at
> > the end ensure that all the DDL scripts were properly reproduced -- for
> > example transmit them to another database, replay them there, and dump
> > both databases and compare them. However, there were challenges which I
> > no longer remember and we were unable to complete this, and we are where
> > we are.
> >
> > Thanks for rebasing that old code, BTW.
>
> I agree that deparsing could be a very useful utility on its own. Not
> only for SQL command replication between PostgreSQL servers, but also
> potentially feasible for SQL command replication between PotgreSQL and
> other database systems. For example, one could assemble the json
> representation of the SQL parse tree back to a SQL command that can be
> run in MySQL. But that requires different assembling rules and code
> for different database systems. If we're envisioning this kind of
> flexibility that the deparsing utility can offer, then I think it's
> better to develop the deparsing utility as an extension itself.

IIUC the event trigger can already provide such flexibility. That is,
one could create an extension that creates an event trigger and in the
event trigger function it deparses the SQL parse tree to a SQL command
that can be run in MySQL. While having such flexibility, I’m fine with
having built-in deparsing utility in the core.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

From:	Ajin Cherian <itsajin(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rajesh(dot)rs0541(at)gmail(dot)com, Hou, Zhijie/侯志杰 <houzj(dot)fnst(at)fujitsu(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-05-11 06:21:59
Message-ID:	CAFPTHDa1kiF8wqCt_z6_v0U-1FHujTjJL0k0YTCRNFSe82wpnw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Wed, May 11, 2022 at 1:17 AM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:

> > I have updated Amit's patch by including a public action "create" when
> > creating publication which is turned off by default.
> > Now the 'make check' tests pass. I also fixed a problem that failed to
> > create tables when the table has a primary key.
>

I have rearranged the files around and removed
src/bin/pg_waldump/logicalddlmsgdesc.c
as it is a generated file.

regards,
Ajin Cherian
Fujitsu Australia

Attachment	Content-Type	Size
v3-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	63.9 KB
v3-0002-Support-DDL-replication.patch	application/octet-stream	102.2 KB

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Hou, Zhijie/侯志杰 <houzj(dot)fnst(at)fujitsu(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-05-11 07:38:56
Message-ID:	CAD21AoAX_xiO03hXSY2QfbcKT0XiUvtnzTjy+NRJ_EcgBa5B3A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Mon, May 9, 2022 at 6:05 PM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>
> On 2022-May-08, Dilip Kumar wrote:
>
> > On Sat, May 7, 2022 at 9:38 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> > > I agree that it adds to our maintainability effort, like every time we
> > > enhance any DDL or add a new DDL that needs to be replicated, we
> > > probably need to change the deparsing code. But OTOH this approach
> > > seems to provide better flexibility. So, in the long run, maybe the
> > > effort is worthwhile. I am not completely sure at this stage which
> > > approach is better but I thought it is worth evaluating this approach
> > > as Alvaro and Robert seem to prefer this idea.
> >
> > +1, IMHO with deparsing logic it would be easy to handle the mixed DDL
> > commands like ALTER TABLE REWRITE. But the only thing is that we will
> > have to write the deparsing code for all the utility commands so there
> > will be a huge amount of new code to maintain.
>
> Actually, the largest stumbling block on this, IMO, is having a good
> mechanism to verify that all DDLs are supported. The deparsing code
> itself is typically not very bad, as long as we have a sure way to twist
> every contributor's hand into writing it, which is what an automated
> test mechanism would give us.

Agreed.

>
> The code in test_ddl_deparse is a pretty lame start, not nearly good
> enough by a thousand miles. My real intention was to have a test
> harness that would first run a special SQL script to install DDL
> capture, then run all the regular src/test/regress scripts, and then at
> the end ensure that all the DDL scripts were properly reproduced -- for
> example transmit them to another database, replay them there, and dump
> both databases and compare them. However, there were challenges which I
> no longer remember and we were unable to complete this, and we are where
> we are.

I think the regression test suite improvements in these few years make
it easier to implement such regression tests in order to check not
only the existing commands but also future changes. I'll try it if no
one is working on it, and let us see if there are challenges.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Hou, Zhijie/侯志杰 <houzj(dot)fnst(at)fujitsu(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-05-11 09:15:18
Message-ID:	CAA4eK1L2NvQAFSZ8Tgo6dsPEYb3-wmVYpA4gieGO8Ym6U4RZZg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Wed, May 11, 2022 at 1:09 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Mon, May 9, 2022 at 6:05 PM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> >
> > On 2022-May-08, Dilip Kumar wrote:
>
> >
> > The code in test_ddl_deparse is a pretty lame start, not nearly good
> > enough by a thousand miles. My real intention was to have a test
> > harness that would first run a special SQL script to install DDL
> > capture, then run all the regular src/test/regress scripts, and then at
> > the end ensure that all the DDL scripts were properly reproduced -- for
> > example transmit them to another database, replay them there, and dump
> > both databases and compare them. However, there were challenges which I
> > no longer remember and we were unable to complete this, and we are where
> > we are.
>
> I think the regression test suite improvements in these few years make
> it easier to implement such regression tests in order to check not
> only the existing commands but also future changes. I'll try it if no
> one is working on it, and let us see if there are challenges.
>

I think it is a good idea to give it a try. However, one point to note
in this regard is that if we decide to use deparsing for DDL logical
replication then the tests for logical replication will automatically
test all the deparsing code.

--
With Regards,
Amit Kapila.

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Hou, Zhijie/侯志杰 <houzj(dot)fnst(at)fujitsu(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-05-11 12:55:00
Message-ID:	CAD21AoAtew5AxdigsOWbv=Lav2vy-qA6VoXiNgAbxeLMnBU=Rg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Wed, May 11, 2022 at 6:15 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, May 11, 2022 at 1:09 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Mon, May 9, 2022 at 6:05 PM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> > >
> > > On 2022-May-08, Dilip Kumar wrote:
> >
> > >
> > > The code in test_ddl_deparse is a pretty lame start, not nearly good
> > > enough by a thousand miles. My real intention was to have a test
> > > harness that would first run a special SQL script to install DDL
> > > capture, then run all the regular src/test/regress scripts, and then at
> > > the end ensure that all the DDL scripts were properly reproduced -- for
> > > example transmit them to another database, replay them there, and dump
> > > both databases and compare them. However, there were challenges which I
> > > no longer remember and we were unable to complete this, and we are where
> > > we are.
> >
> > I think the regression test suite improvements in these few years make
> > it easier to implement such regression tests in order to check not
> > only the existing commands but also future changes. I'll try it if no
> > one is working on it, and let us see if there are challenges.
> >
>
> I think it is a good idea to give it a try. However, one point to note
> in this regard is that if we decide to use deparsing for DDL logical
> replication then the tests for logical replication will automatically
> test all the deparsing code.

Do you mean that in tests for logical replication we run regression
tests on the publisher and check if relations are
created/altered/dropped expectedly on the subscriber? If we rely on
tests for logical replication, I think logical replication will have
to support all future DDL changes/features.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Hou, Zhijie/侯志杰 <houzj(dot)fnst(at)fujitsu(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-05-12 02:54:33
Message-ID:	CAA4eK1+tyJnXQZE13W1siMq46Bo53bBmA642WYkri4M-zAOZbg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Wed, May 11, 2022 at 6:25 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Wed, May 11, 2022 at 6:15 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Wed, May 11, 2022 at 1:09 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > On Mon, May 9, 2022 at 6:05 PM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> > > >
> > > >
> > > > The code in test_ddl_deparse is a pretty lame start, not nearly good
> > > > enough by a thousand miles. My real intention was to have a test
> > > > harness that would first run a special SQL script to install DDL
> > > > capture, then run all the regular src/test/regress scripts, and then at
> > > > the end ensure that all the DDL scripts were properly reproduced -- for
> > > > example transmit them to another database, replay them there, and dump
> > > > both databases and compare them. However, there were challenges which I
> > > > no longer remember and we were unable to complete this, and we are where
> > > > we are.
> > >
> > > I think the regression test suite improvements in these few years make
> > > it easier to implement such regression tests in order to check not
> > > only the existing commands but also future changes. I'll try it if no
> > > one is working on it, and let us see if there are challenges.
> > >
> >
> > I think it is a good idea to give it a try. However, one point to note
> > in this regard is that if we decide to use deparsing for DDL logical
> > replication then the tests for logical replication will automatically
> > test all the deparsing code.
>
> Do you mean that in tests for logical replication we run regression
> tests on the publisher and check if relations are
> created/altered/dropped expectedly on the subscriber?
>

Yes.

> If we rely on
> tests for logical replication, I think logical replication will have
> to support all future DDL changes/features.
>

Agreed, so we can't completely rely on logical replication tests.

--
With Regards,
Amit Kapila.

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-05-13 21:01:33
Message-ID:	CAAD30U+CRgUgkAg33KzNBKwCbsgiSc5z3NYvxNzEfS0Zg2S1WA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

> > 4. The handling related to partition tables seems missing because, on
> > the subscriber-side, it always creates a relation entry in
> > pg_subscription_rel which won't work. Check its interaction with
> > publish_via_partition_root.
>
> I will test it out.

Hi,

patch 0010 properly handles partitioned table creation on the apply
worker. Whether a replicated partitioned table should be added to
pg_subscription_rel catalog depends on the setting of
publish_via_partition_root of the publication. Thus we need to connect
to the source DB and check if the partitioned table is in
pg_catalog.pg_publication_tables after the apply worker creates the
partitioned table.

Thanks to Borui Yang for enabling and testing replication of DDL type
T_RenameStmt in patch 0009.

I've also rebased all the patches. Github branch of the same change
can be found here:
https://github.com/zli236/postgres/commits/ddl_replication

Regards,
Zheng

Attachment	Content-Type	Size
0005-Support-replication-of-CREATE-.-AS-.-and-SELECT-.-IN.patch	application/octet-stream	8.0 KB
0002-Support-logical-logging-and-decoding-of-DDL-command-.patch	application/octet-stream	61.4 KB
0003-Logical-replication-of-DDL-messages.patch	application/octet-stream	41.2 KB
0004-Enable-replication-of-CREATE-MATERIALIZED-VIEW-AS-st.patch	application/octet-stream	4.1 KB
0006-Add-couple-tests-for-DDL-replication-case.patch	application/octet-stream	4.5 KB
0009-Support-replication-of-DDL-type-T_RenameStmt-table-r.patch	application/octet-stream	8.3 KB
0001-Define-DDL-replication-levels-via-the-CREATE-PUBLICA.patch	application/octet-stream	65.0 KB
0007-Enable-logging-and-replication-of-DDLs-executed-insi.patch	application/octet-stream	17.5 KB
0008-Fail-replication-worker-on-DDL-command-that-rewrites.patch	application/octet-stream	4.8 KB
0010-Handle-partitioned-table-creation-on-the-apply-worke.patch	application/octet-stream	11.5 KB

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Euler Taveira <euler(at)eulerto(dot)com>
Cc:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-05-23 11:54:31
Message-ID:	CAA4eK1LPX71LzSye6ac-OOx6=wWq7U2PL-1tBf3zvncSko8Srw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Thu, Apr 14, 2022 at 7:45 PM Euler Taveira <euler(at)eulerto(dot)com> wrote:
>
> You should forbid it. Unless you can decompose the command into multiple SQL
> commands to make it a safe operation for logical replication.
>
> Let's say you want to add a column with a volatile default.
>
> ALTER TABLE foo ADD COLUMN bar double precision DEFAULT random();
>
> If you replicate the DDL command as is, you will have different data
> downstream. You should forbid it. However, this operation can be supported if
> the DDL command is decomposed in multiple steps.
>
> -- add a new column without DEFAULT to avoid rewrite
> ALTER TABLE foo ADD COLUMN bar double precision;
>
> -- future rows could use the DEFAULT expression
> -- it also doesn't rewrite the table
> ALTER TABLE foo ALTER COLUMN bar SET DEFAULT random();
>
> -- it effectively rewrites the table
> -- all rows are built from one source node
> -- data will be the same on all nodes
> UPDATE foo SET bar = random();
>

While thinking about this, I see more to it than this. Say, we are
able to decompose/split the DDL command with the help of deparser, do
we want to update the additional rows on the subscriber that didn't
exist on the publisher? For example,

A table on the publisher side has rows:
ddl_test(foo)
a
----
1
2

The same table on the subscriber side has rows:
ddl_test(foo)
a
----
1
2
3
4

Now, say, the user has added a bar column with "ALTER TABLE foo ADD
COLUMN bar double precision NOT NULL DEFAULT random();" If we compare
with replication of DMLs like (UPDATE ddl_test SET bar = random();),
the replication won't update rows with values (3 and 4) on subscriber
as they don't exist on the publisher. However, if we follow the same
here for DDL replication of Alter, it will fail because of NOT NULL
constraint. So, it seems we should update all the existing rows on the
subscriber to make replication of such constraints successful. It
seems that IBM's replication solution allows replication of such DDLs
and does update all existing rows on the target table [1][2].

I think it would be tricky to update the rows in subscriber that
doesn't exist in the publisher as we need to distinguish/find such
rows during apply but I think we can find some way to achieve this if
we decide to go this way.

We can also conclude that we want to restrict the replication of Alter
Table for such cases but as other databases seem to support this, I
think it is worth trying to support such an operation. If it turns out
to be too complex or not at all feasible then we can always exclude it
from the first version.

Thoughts?

[1] - https://www.ibm.com/docs/en/idr/10.2.1?topic=replication-adding-existing-columns-subscription-unidirectional
(... Columns are added to the target table with the same data type,
null characteristic, and default value as the matching columns in the
source table... Rows that existed in the target table before the new
column is added will have a NULL or default value for the new column.
[2] - https://www.ibm.com/docs/en/idr/11.4.0?topic=replication-alter-add-column-command-multidirectional

--
With Regards,
Amit Kapila.

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Euler Taveira <euler(at)eulerto(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-05-23 15:40:17
Message-ID:	CAAD30UKJpnxXQBgFunTbw3vXRcaPSJEDHJ6GPpV3c-CTst0WUQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

> Now, say, the user has added a bar column with "ALTER TABLE foo ADD
> COLUMN bar double precision NOT NULL DEFAULT random();" If we compare
> with replication of DMLs like (UPDATE ddl_test SET bar = random();),
> the replication won't update rows with values (3 and 4) on subscriber
> as they don't exist on the publisher. However, if we follow the same
> here for DDL replication of Alter, it will fail because of NOT NULL
> constraint. So, it seems we should update all the existing rows on the
> subscriber to make replication of such constraints successful. It
> seems that IBM's replication solution allows replication of such DDLs
> and does update all existing rows on the target table [1][2].
>
> I think it would be tricky to update the rows in subscriber that
> doesn't exist in the publisher as we need to distinguish/find such
> rows during apply but I think we can find some way to achieve this if
> we decide to go this way.

I think the behavior that makes most sense here is to replicate the default
values of the new column for existing rows from the publisher, and generate
default values for the additional rows on the subscriber.

> We can also conclude that we want to restrict the replication of Alter
> Table for such cases but as other databases seem to support this, I
> think it is worth trying to support such an operation. If it turns out
> to be too complex or not at all feasible then we can always exclude it
> from the first version.

I think it might be OK that we exclude such cases (fail apply worker on
occurence) for the first version. I've spent some time on it and it seems
to be a high effort, low reward task at the moment.

Regards,
Zheng

From:	Ajin Cherian <itsajin(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, rajesh(dot)rs0541(at)gmail(dot)com, Hou, Zhijie/侯志杰 <houzj(dot)fnst(at)fujitsu(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-05-26 05:45:47
Message-ID:	CAFPTHDaBodoZ5c7U1uyokbvq+zUvhJ4ps-7H66nHGw45UnO0OQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

I have updated this "deparse" patch-set to include support for 'drop table':

Only the drop command of those tables in the publication is replicated.

This is achieved with two event trigger handlers, 'command start' and
'command end' for drop table commands.

The 'command start' event handler logs a ddl message with the relids
of the tables that are dropped which the output plugin (pgoutput)
stores in its internal data structure after verifying that it is for a
table that is part of the publication. Later the 'command end' event
handler
sends the actual drop message. Pgoutput on receiving the command end,
only sends out the drop command only if it is for one of the
relids marked for deleting. The reason we have to do this is because,
once the logical decoder receives the 'command end' message,
the relid of the table is no longer valid as it has been deleted as
part of invalidations received for the drop table command.
It is no longer possible to verify if the table is part of the
publication list or not. To make this possible, I have added two more
elements
to the ddl xlog and ddl message, (relid and cmdtype).

We could have also handled all this on the subscriber side as well,
but that would mean sending spurious ddl messages for tables that
are not part of the publication.

Example:

publisher:

subscriber:

publisher:

subscriber:

publisher:

subscriber:

>>>DROP table not replicated since table not in publication list.

regards,
Ajin Cherian

Attachment	Content-Type	Size
v5-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	63.9 KB
v5-0003-Add-support-for-drop-tables-in-logical-replicatio.patch	application/octet-stream	75.9 KB
v5-0002-Support-DDL-replication.patch	application/octet-stream	102.9 KB

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-05-26 08:35:50
Message-ID:	CAD21AoAv_wsBEK8jcqjBpatspiP=5E+qLokw9zCESBSvCAiRMg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Sat, May 14, 2022 at 6:02 AM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
>
> > > 4. The handling related to partition tables seems missing because, on
> > > the subscriber-side, it always creates a relation entry in
> > > pg_subscription_rel which won't work. Check its interaction with
> > > publish_via_partition_root.
> >
> > I will test it out.
>
> Hi,
>
> patch 0010 properly handles partitioned table creation on the apply
> worker. Whether a replicated partitioned table should be added to
> pg_subscription_rel catalog depends on the setting of
> publish_via_partition_root of the publication. Thus we need to connect
> to the source DB and check if the partitioned table is in
> pg_catalog.pg_publication_tables after the apply worker creates the
> partitioned table.
>
> Thanks to Borui Yang for enabling and testing replication of DDL type
> T_RenameStmt in patch 0009.
>
> I've also rebased all the patches. Github branch of the same change
> can be found here:
> https://github.com/zli236/postgres/commits/ddl_replication

Thank you for updating the patches!

I've not looked at these patches in-depth yet but with this approach,
what do you think we can handle the DDL syntax differences between
major versions? DDL syntax or behavior could be changed by future
changes and I think we need to somehow deal with the differences. For
example, if the user uses logical replication for major version
upgrade, the publisher is older than the subscriber. We might have to
rewrite the DDL before applying to the subscriber because the DDL
executed on the publisher no longer work on a new PostgreSQL version
or we might have to add some options to the DDL before the application
in order to keep the same behavior. This seems to require a different
solution from what the patch does for the problem you mentioned such
as "DDL involving multiple tables where only some tables are
replicated”.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-05-26 22:19:21
Message-ID:	CAAD30UK6T8bfW1JMaSSRDSynB6W05HjNrmvSp+tvXp-jdu9xFQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Hi Masahiko,

> Thank you for updating the patches!
>
> I've not looked at these patches in-depth yet but with this approach,
> what do you think we can handle the DDL syntax differences between
> major versions? DDL syntax or behavior could be changed by future
> changes and I think we need to somehow deal with the differences. For

> example, if the user uses logical replication for major version
> upgrade, the publisher is older than the subscriber. We might have to
> rewrite the DDL before applying to the subscriber because the DDL
> executed on the publisher no longer work on a new PostgreSQL version

I don't think we will allow this kind of situation to happen in the
first place for
backward compatibility. If a DDL no longer works on a new version of
PostgreSQL, the user will have to change the application code as well.
So even if it happens for
whatever reason, we could either
1. fail the apply worker and let the user fix such DDL because they'll
have to fix the application code anyway when this happens.
2. add guard rail logic in the apply worker to automatically fix such
DDL if possible, knowing the version of the source and target. Similar
logic must have been implemented for pg_dump/restore/upgrade.

> or we might have to add some options to the DDL before the application
> in order to keep the same behavior. This seems to require a different
> solution from what the patch does for the problem you mentioned such

> as "DDL involving multiple tables where only some tables are
> replicated”.

First of all, this case can only happen when the customer chooses to
only replicate a subset of the tables in a database in which case
table level DDL replication is chosen instead of database level DDL
replication (where all tables
and DDLs are replicated). I think the solution would be:
1. make best effort to detect such DDLs on the publisher and avoid
logging of such DDLs in table level DDL replication.
2. apply worker will fail to replay such command due to missing
objects if such DDLs didn't get filtered on the publisher for some
reason. This should be rare and I think it's OK even if it happens,
we'll find out
why and fix it.

Regards,
Zheng

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-05-27 02:03:05
Message-ID:	CAA4eK1JQhz4y-1rYxwFxHYEAN-1JKeO0iT+Nip0N7jJUj_g7RA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, May 27, 2022 at 3:49 AM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
>
> Hi Masahiko,
>
> > Thank you for updating the patches!
> >
> > I've not looked at these patches in-depth yet but with this approach,
> > what do you think we can handle the DDL syntax differences between
> > major versions? DDL syntax or behavior could be changed by future
> > changes and I think we need to somehow deal with the differences. For
>
> > example, if the user uses logical replication for major version
> > upgrade, the publisher is older than the subscriber. We might have to
> > rewrite the DDL before applying to the subscriber because the DDL
> > executed on the publisher no longer work on a new PostgreSQL version
>
> I don't think we will allow this kind of situation to happen in the
> first place for
> backward compatibility. If a DDL no longer works on a new version of
> PostgreSQL, the user will have to change the application code as well.
> So even if it happens for
> whatever reason, we could either
> 1. fail the apply worker and let the user fix such DDL because they'll
> have to fix the application code anyway when this happens.
> 2. add guard rail logic in the apply worker to automatically fix such
> DDL if possible, knowing the version of the source and target. Similar
> logic must have been implemented for pg_dump/restore/upgrade.
>
> > or we might have to add some options to the DDL before the application
> > in order to keep the same behavior. This seems to require a different
> > solution from what the patch does for the problem you mentioned such
>
> > as "DDL involving multiple tables where only some tables are
> > replicated”.
>
> First of all, this case can only happen when the customer chooses to
> only replicate a subset of the tables in a database in which case
> table level DDL replication is chosen instead of database level DDL
> replication (where all tables
> and DDLs are replicated). I think the solution would be:
> 1. make best effort to detect such DDLs on the publisher and avoid
> logging of such DDLs in table level DDL replication.
> 2. apply worker will fail to replay such command due to missing
> objects if such DDLs didn't get filtered on the publisher for some
> reason. This should be rare and I think it's OK even if it happens,
> we'll find out
> why and fix it.
>

FWIW, both these cases could be handled with the deparsing approach,
and the handling related to the drop of multiple tables where only a
few are published is already done in the last POC patch shared by Ajin
[1].

[1] - https://www.postgresql.org/message-id/CAFPTHDaBodoZ5c7U1uyokbvq%2BzUvhJ4ps-7H66nHGw45UnO0OQ%40mail.gmail.com

--
With Regards,
Amit Kapila.

From:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To:	Ajin Cherian <itsajin(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "rajesh(dot)rs0541(at)gmail(dot)com" <rajesh(dot)rs0541(at)gmail(dot)com>
Subject:	RE: Support logical replication of DDLs
Date:	2022-05-27 09:07:46
Message-ID:	OS0PR01MB5716D74348FB1F76008B31CB94D89@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Thursday, May 26, 2022 1:46 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
> On Tue, May 10, 2022 at 9:33 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
> >
> > On Fri, May 6, 2022 at 11:24 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> > > As we have hacked CreatePublication function for this POC, the
> > > regression tests are not passing but we can easily change it so that
> > > we invoke new functionality with the syntax proposed in this thread
> > > or with some other syntax and we shall do that in the next patch
> > > unless this approach is not worth pursuing.
> > >
> > > This POC is prepared by Ajin Cherian, Hou-San, and me.
> > >
> > > Thoughts?
> > >
> > > [1] -
> > >
> https://www.postgresql.org/message-id/20150215044814.GL3391%40alvh.n
> > > o-ip.org
> >
> > I have updated Amit's patch by including a public action "create" when
> > creating publication which is turned off by default.
> > Now the 'make check' tests pass. I also fixed a problem that failed to
> > create tables when the table has a primary key.
>
> I have updated this "deparse" patch-set to include support for 'drop table':
>

Here is the new version POC patches which add support for 'ALTER TABLE'

For non-rewrite ALTER TABLE command:
we deparse the command and WAL log the deparsed json string with the relid
of the altered table at table_rewrite event trigger. The WALSender decodes
the WAL and sends it to subscriber if the altered table is published. Most
of ALTER TABLE command are supported except some commands(DDL related to
PARTITIONED TABLE ...) that introduced recently which haven't been
supported by the current ddl_deparser, we will support that later.

For table_rewrite ALTER TABLE command:
(ALTER COLUMN TYPE, ADD COLUMN DEFAULT, SET LOGGED, SET ACCESS METHOD)

we deparse the command and WAL log the deparsed json string with the relid
of the altered table at table_rewrite event trigger. The WALSender decodes
the WAL and sends it to subscriber if the altered table is published.
Then, the WALSender will convert the upcoming rewrite INSERTs to UPDATEs
and send them to subscriber so that the data between publisher and
subscriber can always be consistent. Note that the tables that publish
rewrite ddl must have a replica identity configured in order to be able to
replicate the upcoming rewrite UPDATEs.

We do this way because of two reasons:
(1) The data before the rewrite ddl could already be different among
publisher and subscriber. To make sure the extra data in subscriber which
doesn't exist in publisher also get rewritten, we need to let the
subscriber execute the original rewrite ddl to rewrite all the data at
first.

(2) the data after executing rewrite ddl could be different among
publisher and subscriber(due to different functions/operators used during
rewrite), so we need to replicate the rewrite UPDATEs to keep the data
consistent.

Here is the example for the rewrite ddl: ALTER TABLE ADD COLUMN DEFAULT:

A table on the publisher side has rows:
ddl_test(a)
a
----
1
2

The same table on the subscriber side has rows:
ddl_test(a)
a
----
1
2
3
4

-----------------------------------
If we execute "ALTER TABLE ddl_test ADD COLUMN b int DEFAULT random();" on
publisher. The row(1,2) on subscriber will be updated by the rewrite
UPDATE received from publisher. The row(3,4) will be updated by exectuing
the "ADD COLUMN b int DEFAULT random();" on the subscriber.
-----------------------------------

ddl_test(a,b)
a|b
----
1|random num(pub)
2|random num(pub)

TO IMPROVE:

This approach could be improved by letting the subscriber try to update
the extra data itself instead of doing fully rewrite ddl and use the
upcoming rewrite UPDATEs to rewrite the rest data. To achieve this, we
could modify the deparsed json string to temporarily remove the rewrite
part and add some logic in subscriber to update the extra data.
Besides, we may not need to send rewrite changes for all type of rewrite
ddl, for example, it seems fine to skip sending rewrite changes for ALTER
TABLE SET LOGGED as the data in the table doesn't actually be changed. We
could use the deparser and event trigger to filter these ddls and skip
sending rewrite changes for them.

Best regards,
Hou zj

Attachment	Content-Type	Size
v5-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	85.8 KB
v5-0002-Support-DDL-replication.patch	application/octet-stream	123.7 KB

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-05-27 16:15:18
Message-ID:	CAD21AoDtNMxE30ExN5b6AXriyjPpEZZtKDbgGD91udS8FBBQ_Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, May 27, 2022 at 7:19 AM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
>
> Hi Masahiko,
>
> > Thank you for updating the patches!
> >
> > I've not looked at these patches in-depth yet but with this approach,
> > what do you think we can handle the DDL syntax differences between
> > major versions? DDL syntax or behavior could be changed by future
> > changes and I think we need to somehow deal with the differences. For
>
> > example, if the user uses logical replication for major version
> > upgrade, the publisher is older than the subscriber. We might have to
> > rewrite the DDL before applying to the subscriber because the DDL
> > executed on the publisher no longer work on a new PostgreSQL version
>
> I don't think we will allow this kind of situation to happen in the
> first place for
> backward compatibility.

It seems like a big limitation to me.

> If a DDL no longer works on a new version of
> PostgreSQL, the user will have to change the application code as well.
> So even if it happens for
> whatever reason, we could either
> 1. fail the apply worker and let the user fix such DDL because they'll
> have to fix the application code anyway when this happens.

Once the apply worker received the DDL, if the DDL doesn't work on the
subscriber, it will enter an infinite loop until the problem is fixed.
If the failure is due to a syntax error, how does the user fix it?

> 2. add guard rail logic in the apply worker to automatically fix such
> DDL if possible, knowing the version of the source and target. Similar
> logic must have been implemented for pg_dump/restore/upgrade.

If I'm not missing something, there is no such implementation in
pg_dump/restore/upgrade. When we use pg_dump/pg_restore for major
version upgrades, we usually use the newer version pg_dump to fetch
objects from the older version server, then restore the objects by
using the newer version pg_restore.

>
> > or we might have to add some options to the DDL before the application
> > in order to keep the same behavior. This seems to require a different
> > solution from what the patch does for the problem you mentioned such
>
> > as "DDL involving multiple tables where only some tables are
> > replicated”.
>
> First of all, this case can only happen when the customer chooses to
> only replicate a subset of the tables in a database in which case
> table level DDL replication is chosen instead of database level DDL
> replication (where all tables
> and DDLs are replicated). I think the solution would be:
> 1. make best effort to detect such DDLs on the publisher and avoid
> logging of such DDLs in table level DDL replication.

I think it's better to support this case.

> 2. apply worker will fail to replay such command due to missing
> objects if such DDLs didn't get filtered on the publisher for some
> reason. This should be rare and I think it's OK even if it happens,
> we'll find out
> why and fix it.

I'm not sure it's rare since replicating a subset of tables is a
common use case of logical replication. But even if we want to go this
way I think we should consider how to fix it at this stage, otherwise
we will end up redesigning it.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-05-30 06:52:29
Message-ID:	CAD21AoCnGwx2F+Ph3dpoJVq0YR8ke3P59XCs439pW=BRfdzgTQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, May 27, 2022 at 11:03 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, May 27, 2022 at 3:49 AM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
> >
> > Hi Masahiko,
> >
> > > Thank you for updating the patches!
> > >
> > > I've not looked at these patches in-depth yet but with this approach,
> > > what do you think we can handle the DDL syntax differences between
> > > major versions? DDL syntax or behavior could be changed by future
> > > changes and I think we need to somehow deal with the differences. For
> >
> > > example, if the user uses logical replication for major version
> > > upgrade, the publisher is older than the subscriber. We might have to
> > > rewrite the DDL before applying to the subscriber because the DDL
> > > executed on the publisher no longer work on a new PostgreSQL version
> >
> > I don't think we will allow this kind of situation to happen in the
> > first place for
> > backward compatibility. If a DDL no longer works on a new version of
> > PostgreSQL, the user will have to change the application code as well.
> > So even if it happens for
> > whatever reason, we could either
> > 1. fail the apply worker and let the user fix such DDL because they'll
> > have to fix the application code anyway when this happens.
> > 2. add guard rail logic in the apply worker to automatically fix such
> > DDL if possible, knowing the version of the source and target. Similar
> > logic must have been implemented for pg_dump/restore/upgrade.
> >
> > > or we might have to add some options to the DDL before the application
> > > in order to keep the same behavior. This seems to require a different
> > > solution from what the patch does for the problem you mentioned such
> >
> > > as "DDL involving multiple tables where only some tables are
> > > replicated”.
> >
> > First of all, this case can only happen when the customer chooses to
> > only replicate a subset of the tables in a database in which case
> > table level DDL replication is chosen instead of database level DDL
> > replication (where all tables
> > and DDLs are replicated). I think the solution would be:
> > 1. make best effort to detect such DDLs on the publisher and avoid
> > logging of such DDLs in table level DDL replication.
> > 2. apply worker will fail to replay such command due to missing
> > objects if such DDLs didn't get filtered on the publisher for some
> > reason. This should be rare and I think it's OK even if it happens,
> > we'll find out
> > why and fix it.
> >
>
> FWIW, both these cases could be handled with the deparsing approach,
> and the handling related to the drop of multiple tables where only a
> few are published is already done in the last POC patch shared by Ajin
> [1].
>

Right. So I'm inclined to think that deparsing approach is better from
this point as well as the point mentioned by Álvaro before[1].

Regards,

[1] https://www.postgresql.org/message-id/202204081134.6tcmf5cxl3sz%40alvherre.pgsql

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

From:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	RE: Support logical replication of DDLs
Date:	2022-06-02 12:14:00
Message-ID:	OS0PR01MB571695EDF9EAB2422FBF2C1094DE9@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Monday, May 30, 2022 2:52 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Fri, May 27, 2022 at 11:03 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Fri, May 27, 2022 at 3:49 AM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
> > >
> > > Hi Masahiko,
> > >
> > > > Thank you for updating the patches!
> > > >
> > > > I've not looked at these patches in-depth yet but with this approach,
> > > > what do you think we can handle the DDL syntax differences between
> > > > major versions? DDL syntax or behavior could be changed by future
> > > > changes and I think we need to somehow deal with the differences. For
> > >
> > > > example, if the user uses logical replication for major version
> > > > upgrade, the publisher is older than the subscriber. We might have to
> > > > rewrite the DDL before applying to the subscriber because the DDL
> > > > executed on the publisher no longer work on a new PostgreSQL version
> > >
> > > I don't think we will allow this kind of situation to happen in the
> > > first place for
> > > backward compatibility. If a DDL no longer works on a new version of
> > > PostgreSQL, the user will have to change the application code as well.
> > > So even if it happens for
> > > whatever reason, we could either
> > > 1. fail the apply worker and let the user fix such DDL because they'll
> > > have to fix the application code anyway when this happens.
> > > 2. add guard rail logic in the apply worker to automatically fix such
> > > DDL if possible, knowing the version of the source and target. Similar
> > > logic must have been implemented for pg_dump/restore/upgrade.
> > >
> > > > or we might have to add some options to the DDL before the application
> > > > in order to keep the same behavior. This seems to require a different
> > > > solution from what the patch does for the problem you mentioned such
> > >
> > > > as "DDL involving multiple tables where only some tables are
> > > > replicated”.
> > >
> > > First of all, this case can only happen when the customer chooses to
> > > only replicate a subset of the tables in a database in which case
> > > table level DDL replication is chosen instead of database level DDL
> > > replication (where all tables
> > > and DDLs are replicated). I think the solution would be:
> > > 1. make best effort to detect such DDLs on the publisher and avoid
> > > logging of such DDLs in table level DDL replication.
> > > 2. apply worker will fail to replay such command due to missing
> > > objects if such DDLs didn't get filtered on the publisher for some
> > > reason. This should be rare and I think it's OK even if it happens,
> > > we'll find out
> > > why and fix it.
> > >
> >
> > FWIW, both these cases could be handled with the deparsing approach,
> > and the handling related to the drop of multiple tables where only a
> > few are published is already done in the last POC patch shared by Ajin
> > [1].
> >
>
> Right. So I'm inclined to think that deparsing approach is better from
> this point as well as the point mentioned by Álvaro before[1].

I agree. One more point about deparsing approach is that it can also
help to replicate CREATE TABLE AS/SELECT INTO in a better way.

The main idea of replicating the CREATE TABLE AS is that we deprase the CREATE
TABLE AS into a simple CREATE TABLE(without subquery) command and WAL log it
after creating the table and before writing data into the table and replicate
the incoming writes later as normal INSERTs. In this apporach, we don't execute
the subquery on subscriber so that don't need to make sure all the objects
referenced in the subquery also exists in subscriber. And This approach works
for all kind of commands(e.g. CRAETE TABLE AS [SELECT][EXECUTE][VALUES])

One problem of this approach is that we cannot use the current trigger to
deparse or WAL log the CREATE TABLE. Because none of the even trigger is fired
after creating the table and before inserting the data. To solve this, one idea
is that we could directly add some code at the end of create_ctas_internal() to
deparse and WAL log it. Moreover, we could even introduce a new type of event
trigger(table_create) which would be fired at the expected timing so that we
can use the trigger function to deparse and WAL log. I am not sure which way is
better. I temporarily use the second idea which introduce a new type event
trigger in the 0003 POC patch.

In the POC patch, we deparse the command in the table_create event trigger and
WAL log the deparsed json string. The walsender will send the string to
subscriber. And incoming INSERTs will also be replicated.

Best regards,
Hou zj

Attachment	Content-Type	Size
v6-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	85.8 KB
v6-0002-Support-DDL-replication.patch	application/octet-stream	127.5 KB
v6-0003-support-CREATE-TABLE-AS-SELECT-INTO.patch	application/octet-stream	14.9 KB

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-06-03 11:15:36
Message-ID:	CAA4eK1K+1ZCX0NDC-k+ss=YcxxTGUgo1xzkvS12zP8gBp1CiWQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Thu, Jun 2, 2022 at 5:44 PM houzj(dot)fnst(at)fujitsu(dot)com
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> The main idea of replicating the CREATE TABLE AS is that we deprase the CREATE
> TABLE AS into a simple CREATE TABLE(without subquery) command and WAL log it
> after creating the table and before writing data into the table and replicate
> the incoming writes later as normal INSERTs. In this apporach, we don't execute
> the subquery on subscriber so that don't need to make sure all the objects
> referenced in the subquery also exists in subscriber. And This approach works
> for all kind of commands(e.g. CRAETE TABLE AS [SELECT][EXECUTE][VALUES])
>
> One problem of this approach is that we cannot use the current trigger to
> deparse or WAL log the CREATE TABLE. Because none of the even trigger is fired
> after creating the table and before inserting the data. To solve this, one idea
> is that we could directly add some code at the end of create_ctas_internal() to
> deparse and WAL log it. Moreover, we could even introduce a new type of event
> trigger(table_create) which would be fired at the expected timing so that we
> can use the trigger function to deparse and WAL log. I am not sure which way is
> better.
>

I am also not able to think of a better way to replicate CREATE TABLE
AS/SELECT INTO other than to use a new type of event trigger. I think
it is better to discuss this new type of event trigger in a separate
thread with the use case of DDL replication unless we have a better
idea to achieve this.

Few comments:
===============
1. Why not capture the CREATE TABLE/CREATE TABLE AS ... command with
EventTriggerCollectSimpleCommand instead of using
EventTriggerCreateTableStart?

2.
+ /*
+ * Use PG_TRY to ensure parsetree is reset even when one trigger
+ * fails. (This is perhaps not necessary, as the currentState variable will
+ * be removed shortly by our caller, but it seems better to play safe.)
+ */
+ old_parsetree = currentEventTriggerState->currentCommand->parsetree;
+ currentEventTriggerState->currentCommand->d.simple.address = address;
+ currentEventTriggerState->currentCommand->parsetree = parsetree;

Instead of doing this can't we use the parsetree stored in trigdata to
deparse the statement?

3. Is there a reason to invoke the trigger after defining the relation
instead of doing it before similar to table_rewrite trigger
(EventTriggerTableRewrite).

4. It should be published for all tables publication similar to 'create table'

5. The new code in CreatePublication looks quite haphazard, can we
improve it by moving it into separate functions?

--
With Regards,
Amit Kapila.

From:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	RE: Support logical replication of DDLs
Date:	2022-06-07 02:59:31
Message-ID:	OS0PR01MB57167C38E7D7FA5388DEDB5294A59@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Friday, June 3, 2022 7:16 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, Jun 2, 2022 at 5:44 PM houzj(dot)fnst(at)fujitsu(dot)com
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > The main idea of replicating the CREATE TABLE AS is that we deprase
> > the CREATE TABLE AS into a simple CREATE TABLE(without subquery)
> > command and WAL log it after creating the table and before writing
> > data into the table and replicate the incoming writes later as normal
> > INSERTs. In this apporach, we don't execute the subquery on subscriber
> > so that don't need to make sure all the objects referenced in the
> > subquery also exists in subscriber. And This approach works for all
> > kind of commands(e.g. CRAETE TABLE AS [SELECT][EXECUTE][VALUES])
> >
> > One problem of this approach is that we cannot use the current trigger
> > to deparse or WAL log the CREATE TABLE. Because none of the even
> > trigger is fired after creating the table and before inserting the
> > data. To solve this, one idea is that we could directly add some code
> > at the end of create_ctas_internal() to deparse and WAL log it.
> > Moreover, we could even introduce a new type of event
> > trigger(table_create) which would be fired at the expected timing so
> > that we can use the trigger function to deparse and WAL log. I am not
> > sure which way is better.
> >
>
> I am also not able to think of a better way to replicate CREATE TABLE AS/SELECT
> INTO other than to use a new type of event trigger. I think it is better to discuss
> this new type of event trigger in a separate thread with the use case of DDL
> replication unless we have a better idea to achieve this.
>
> Few comments:
> ===============
> 1. Why not capture the CREATE TABLE/CREATE TABLE AS ... command with
> EventTriggerCollectSimpleCommand instead of using
> EventTriggerCreateTableStart?

After thinking more, I remove this part as it seems enough for the new type
trigger to only catch the CTAS and SELECT INTO.

> 2.
> + /*
> + * Use PG_TRY to ensure parsetree is reset even when one trigger
> + * fails. (This is perhaps not necessary, as the currentState variable
> + will
> + * be removed shortly by our caller, but it seems better to play safe.)
> + */ old_parsetree =
> + currentEventTriggerState->currentCommand->parsetree;
> + currentEventTriggerState->currentCommand->d.simple.address = address;
> + currentEventTriggerState->currentCommand->parsetree = parsetree;
>
> Instead of doing this can't we use the parsetree stored in trigdata to deparse
> the statement?

We need to use the real createstmt which contains the actual column info. But I agree
It looks hacky and I adjusted this part in the patch.

> 3. Is there a reason to invoke the trigger after defining the relation instead of
> doing it before similar to table_rewrite trigger (EventTriggerTableRewrite).

To deparse the createstmt we need to access the catalog info about the newly
created table, so I invoke this after defining table.

> 4. It should be published for all tables publication similar to 'create table'
Agreed.

> 5. The new code in CreatePublication looks quite haphazard, can we improve it
> by moving it into separate functions?

Changed.

Thanks for the comments.

Here is the new version POC patch set for DDL replication.

For V7-0002, I tried to improve the code in publicationcmds.c by introducing a
common function to create the event trigger which can avoid some duplicate
code. Besides, I merged Ajin's DROP CASCADE patch into it. Also adjusted some
other code style.

For V7-0003, I changed the new type trigger to only catch the CREATE TABLE
AS/SELECT INTO. And rename it to "table_init_write" trigger which seems
consistent for its usage. I kept the function
EventTriggerTableInitWriteStart/End as we need these to store the original
parsetree which is needed to filter the tagname(We need to use the tag in
original parsetree to judge which command should fire the event trigger, see
function EventTriggerCommonSetup()). Besides, I also adjusted code try to make
it look less hacky.

Best regards,
Hou zj

Attachment	Content-Type	Size
v7-0002-Support-DDL-replication.patch	application/octet-stream	128.5 KB
v7-0003-support-CREATE-TABLE-A-ELECT-INTO.patch	application/octet-stream	14.1 KB
v7-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	85.8 KB

From:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
Subject:	RE: Support logical replication of DDLs
Date:	2022-06-09 11:43:51
Message-ID:	OS0PR01MB5716733D6C79D2198E5773CC94A79@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Hi,

I did some research for one potential problem[1] mentioned earlier which is related
to the function execution when replicating DDL.

[1]> 4. Statements that have nondeterministic side effects (e.g., as caused
> by triggers, stored procedures, user-defined functions) may result in
> different side effects occurring on each subscriber.
>
> Think how to handle triggers and functions with same name but different
> purpose.
>

Examples:
ALTER TABLE ADD CONSTRAINT func()
ALTER TABLE ADD COLUMN DEFAULT func()
CREATE TRIGGER ... execute procedure func()
...

When replication the above DDLs, there are some cases we need to think about.

----------------
1) The functions used in DDL have same definition among pub/sub.

In this case, if the functions include DML/DDL operations, it will result in
unexpected behavior.

For example:

--- both pub and sub have the same function test_func().
create function test_func()
returns integer as '
begin
CREATE TABLE test(a int);
INSERT INTO test values(1);
return 0;
end
' language plpgsql;

--- We replicate the following DDL
ALTER TABLE t ADD COLUMN a int DEFAULT test_func();

There are three SQLs that would be replicated to the subscriber:
CREATE TABLE test(a int);
INSERT(1);
ALTER TABLE t ADD COLUMN a int DEFAULT test_func();

Then, we would create the table "test" twice and insert same value twice(first by
executing DEFAULT function, second by replaying the CREATE TABLE and INSERT
command) on subcriber.

One possible idea is that we only allow replicating 'immutable/stable' function
which won't include write action so that we don't have this problem. But it
seems a bit restrictive.

----------------

2) The functions used in DDL have different definitions among pub/sub.

I am not sure how to handle this case as this could result in unpredictable
behavior based on the different definitions. And it's difficult to compare the
function definition among pub/sub because the function could call other
functions nested.

OTOH, the current behavior of this case on HEAD is that we don't check the
consistency of the functions executed on pub/sub. For example: the row triggers
will always be fired on subscriber(when enabled) without checking if the same
trigger exists on publisher. So, it might be acceptable if we document this
restriction, although I am not sure.

*******************

- About the solution for the above two points. I think one solution could be:

We can document that user should make sure the DDL to be replicated should not
execute any function which could execute DML/DDL. This seems acceptable as it's
a pretty rare case to execute DML/DDL in a CONSTRAINT function or DEFAULT
function. And the document[1] already suggest similar thing for CONSTRAINT
function. Besides, we can also document that and the functions should be
defined in a consistent way among pub/sub.

[1] https://www.postgresql.org/docs/devel/ddl-constraints.html
> PostgreSQL assumes that CHECK constraints' conditions are immutable, that is,
> they will always give the same result for the same input row. This assumption
> is what justifies examining CHECK constraints only when rows are inserted or
> updated, and not at other times. (The warning above about not referencing
> other table data is really a special case of this restriction.)

- Another solution could be:

We coud introduce a new function flag called (replica_safety) and the values
could be 'safe'/'unsafe'. 'safe' indicates that it's safe to be replicated to
subscriber and it's safe to be executed when replay the DDL from other
publisher.

When replicating the DDL on publisher, we first check if the functions used in
DDL are replica_safe, if not, we don't replicate that DDL. If yes, before
replicating the DDL, we first send a new type FUNCTIONS message which include
functions(used in DDL or ..) information to the subscriber which is similar to
the RELATION messages. And on subscriber, we check if the received functions
exist and if they are also marked replica safe on subscriber. if functions are
not replica safe on sub, we report an error and suggest user to adjust the
problematic function. I haven't tried to implement it so I could miss
something. Thoughts ?

BTW, attach the POC patch set which only fixed a CFbot error and
added some testcases provided by Osumi-san.

Best regards,
Hou zj

Attachment	Content-Type	Size
v8-0004-Test-cases-for-DDL-replication.patch	application/octet-stream	13.2 KB
v8-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	85.8 KB
v8-0002-Support-DDL-replication.patch	application/octet-stream	124.7 KB
v8-0003-support-CREATE-TABLE-A-ELECT-INTO.patch	application/octet-stream	14.1 KB

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-06-10 06:26:57
Message-ID:	CAA4eK1JVynFsj+mcRWj9sewR2yNUs6LuNxJ0eN-gNJ83oKcUOQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Thu, Jun 9, 2022 at 5:14 PM houzj(dot)fnst(at)fujitsu(dot)com
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> Hi,
>
> I did some research for one potential problem[1] mentioned earlier which is related
> to the function execution when replicating DDL.
>
> [1]> 4. Statements that have nondeterministic side effects (e.g., as caused
> > by triggers, stored procedures, user-defined functions) may result in
> > different side effects occurring on each subscriber.
> >
> > Think how to handle triggers and functions with same name but different
> > purpose.
> >
>
> Examples:
> ALTER TABLE ADD CONSTRAINT func()
> ALTER TABLE ADD COLUMN DEFAULT func()
> CREATE TRIGGER ... execute procedure func()
> ...
>
> When replication the above DDLs, there are some cases we need to think about.
>
> ----------------
> 1) The functions used in DDL have same definition among pub/sub.
>
> In this case, if the functions include DML/DDL operations, it will result in
> unexpected behavior.
>
> For example:
>
> --- both pub and sub have the same function test_func().
> create function test_func()
> returns integer as '
> begin
> CREATE TABLE test(a int);
> INSERT INTO test values(1);
> return 0;
> end
> ' language plpgsql;
>
> --- We replicate the following DDL
> ALTER TABLE t ADD COLUMN a int DEFAULT test_func();
>
> There are three SQLs that would be replicated to the subscriber:
> CREATE TABLE test(a int);
> INSERT(1);
> ALTER TABLE t ADD COLUMN a int DEFAULT test_func();
>
> Then, we would create the table "test" twice and insert same value twice(first by
> executing DEFAULT function, second by replaying the CREATE TABLE and INSERT
> command) on subcriber.
>

The other kind of problem is that we don't know whether the tables
being accessed by these DDL/DML are published or not. So blindly
allowing such functions can allow unintended clauses like the tables
accessed in those functions may not even exist on the subscriber-side.

> One possible idea is that we only allow replicating 'immutable/stable' function
> which won't include write action so that we don't have this problem. But it
> seems a bit restrictive.
>
> ----------------
>
> 2) The functions used in DDL have different definitions among pub/sub.
>
> I am not sure how to handle this case as this could result in unpredictable
> behavior based on the different definitions. And it's difficult to compare the
> function definition among pub/sub because the function could call other
> functions nested.
>
> OTOH, the current behavior of this case on HEAD is that we don't check the
> consistency of the functions executed on pub/sub. For example: the row triggers
> will always be fired on subscriber(when enabled) without checking if the same
> trigger exists on publisher. So, it might be acceptable if we document this
> restriction, although I am not sure.
>
> *******************
>
> - About the solution for the above two points. I think one solution could be:
>
> We can document that user should make sure the DDL to be replicated should not
> execute any function which could execute DML/DDL. This seems acceptable as it's
> a pretty rare case to execute DML/DDL in a CONSTRAINT function or DEFAULT
> function. And the document[1] already suggest similar thing for CONSTRAINT
> function. Besides, we can also document that and the functions should be
> defined in a consistent way among pub/sub.
>
> [1] https://www.postgresql.org/docs/devel/ddl-constraints.html
> > PostgreSQL assumes that CHECK constraints' conditions are immutable, that is,
> > they will always give the same result for the same input row. This assumption
> > is what justifies examining CHECK constraints only when rows are inserted or
> > updated, and not at other times. (The warning above about not referencing
> > other table data is really a special case of this restriction.)
>
> - Another solution could be:
>
> We coud introduce a new function flag called (replica_safety) and the values
> could be 'safe'/'unsafe'. 'safe' indicates that it's safe to be replicated to
> subscriber and it's safe to be executed when replay the DDL from other
> publisher.
>

Yeah, something like this could be a good idea but I think for the
first version we should simply raise a WARNING for such DDLs
indicating that they won't be replicated.

--
With Regards,
Amit Kapila.

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-06-12 06:46:27
Message-ID:	CAAD30ULe-cZTELQJbcAahsOFoUO-ftMxorwBTmj7uYK=_=mwxg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

> > > > > I've not looked at these patches in-depth yet but with this approach,
> > > > > what do you think we can handle the DDL syntax differences between
> > > > > major versions? DDL syntax or behavior could be changed by future
> > > > > changes and I think we need to somehow deal with the differences. For
> > > >
> > > > > example, if the user uses logical replication for major version
> > > > > upgrade, the publisher is older than the subscriber. We might have to
> > > > > rewrite the DDL before applying to the subscriber because the DDL
> > > > > executed on the publisher no longer work on a new PostgreSQL version
> > > >
> > > > I don't think we will allow this kind of situation to happen in the
> > > > first place for
> > > > backward compatibility. If a DDL no longer works on a new version of
> > > > PostgreSQL, the user will have to change the application code as well.
> > > > So even if it happens for
> > > > whatever reason, we could either
> > > > 1. fail the apply worker and let the user fix such DDL because they'll
> > > > have to fix the application code anyway when this happens.
> > > > 2. add guard rail logic in the apply worker to automatically fix such
> > > > DDL if possible, knowing the version of the source and target. Similar
> > > > logic must have been implemented for pg_dump/restore/upgrade.
> > > >
> > > > > or we might have to add some options to the DDL before the application
> > > > > in order to keep the same behavior. This seems to require a different
> > > > > solution from what the patch does for the problem you mentioned such
> > > >
> > > > > as "DDL involving multiple tables where only some tables are
> > > > > replicated”.
> > > >
> > > > First of all, this case can only happen when the customer chooses to
> > > > only replicate a subset of the tables in a database in which case
> > > > table level DDL replication is chosen instead of database level DDL
> > > > replication (where all tables
> > > > and DDLs are replicated). I think the solution would be:
> > > > 1. make best effort to detect such DDLs on the publisher and avoid
> > > > logging of such DDLs in table level DDL replication.
> > > > 2. apply worker will fail to replay such command due to missing
> > > > objects if such DDLs didn't get filtered on the publisher for some
> > > > reason. This should be rare and I think it's OK even if it happens,
> > > > we'll find out
> > > > why and fix it.
> > > >
> > >
> > > FWIW, both these cases could be handled with the deparsing approach,
> > > and the handling related to the drop of multiple tables where only a
> > > few are published is already done in the last POC patch shared by Ajin
> > > [1].
> > >
> >
> > Right. So I'm inclined to think that deparsing approach is better from
> > this point as well as the point mentioned by Álvaro before[1].
>
> I agree. One more point about deparsing approach is that it can also
> help to replicate CREATE TABLE AS/SELECT INTO in a better way.
>
> The main idea of replicating the CREATE TABLE AS is that we deprase the CREATE
> TABLE AS into a simple CREATE TABLE(without subquery) command and WAL log it
> after creating the table and before writing data into the table and replicate
> the incoming writes later as normal INSERTs. In this apporach, we don't execute
> the subquery on subscriber so that don't need to make sure all the objects
> referenced in the subquery also exists in subscriber. And This approach works
> for all kind of commands(e.g. CRAETE TABLE AS [SELECT][EXECUTE][VALUES])
>
> One problem of this approach is that we cannot use the current trigger to
> deparse or WAL log the CREATE TABLE. Because none of the even trigger is fired
> after creating the table and before inserting the data. To solve this, one idea
> is that we could directly add some code at the end of create_ctas_internal() to
> deparse and WAL log it. Moreover, we could even introduce a new type of event
> trigger(table_create) which would be fired at the expected timing so that we
> can use the trigger function to deparse and WAL log. I am not sure which way is
> better. I temporarily use the second idea which introduce a new type event
> trigger in the 0003 POC patch.

Hi, I agree that an intermediate structured format (with all objects
schema qualified) makes it easier to handle syntax differences between
the publisher and the subscriber. Such structured format is also
likely easier to use by other logical replication consumers.
However, to make things more maintainable, would it be better to use
the existing serialization/deserialization functions in
out/readfuncs.c to generate the parsetree representation of the DDL
command?

It turns out support for DDL commands are mostly missing in
out/readfuncs at the moment. see the following comment from
outfuncs.c:

* Output support for raw parsetrees
* is somewhat incomplete, too; in particular, utility statements are
* almost entirely unsupported. We try to support everything that can
* appear in a raw SELECT, though.

So what about adding support for utility statements in out/readfuncs.c
so that nodeToString()/stringToNode() works on raw parsetrees of
utility statements? I think there'll be some benefits:
1. It's less code compared to introducing brand new
serialization/deserialization code for DDL commands.
2. It's more maintainable since hackers are already familiar with
out/readfuncs.c.
3. Support for utility statements in out/readfuncs.c may turn out to
be useful for future features that need serialization/deserialization
of DDL raw parsetrees.

In patch 0012 I explored the above idea by changing the logging format
of logical ddlmessage to that of the serialization of the raw
parsetree of the DDL command using nodeToString(), the apply worker
deserializes the message back to the raw parsetree by calling
stringToNode().
Here are the detailed changes:
- Adding serialization/deserialization functions in outfuncs.c/readfuncs.c
for CreateTableStmt, AlterTableStmt, DropStmt, CreateFunctionStmt and
AlterFunctionStmt.
- Modified the serialization process to always schema qualify object names,
this is done by outQualifiedName() and a change in _outRangeVar().
- Change the input of LogLogicalDDLMessage() to use nodeToString(parsetree).
- Change the apply worker to call stringToNode(ddlmessage) to get
the raw parsetree back and then directly execute the parsetree.

This patch doesn't introduce any deparsing functionality yet, but it
also provides the flexibility to edit the command being replayed by
directly modifying the raw parsetree on the apply worker (e.g. setting
the
missing_ok flag for DropStmt is equivalent to adding "IF EXIST" to the
statement).

Thoughts?

Regards,
Zheng Li

Attachment	Content-Type	Size
0004-Enable-replication-of-CREATE-MATERIALIZED-VIEW-AS-st.patch	application/octet-stream	4.1 KB
0005-Support-replication-of-CREATE-.-AS-.-and-SELECT-.-IN.patch	application/octet-stream	8.0 KB
0001-Define-DDL-replication-levels-via-the-CREATE-PUBLICA.patch	application/octet-stream	65.0 KB
0003-Logical-replication-of-DDL-messages.patch	application/octet-stream	41.2 KB
0002-Support-logical-logging-and-decoding-of-DDL-command-.patch	application/octet-stream	61.4 KB
0006-Add-couple-tests-for-DDL-replication-case.patch	application/octet-stream	4.5 KB
0008-Fail-replication-worker-on-DDL-command-that-rewrites.patch	application/octet-stream	4.8 KB
0009-Support-replication-of-DDL-type-T_RenameStmt-table-r.patch	application/octet-stream	8.3 KB
0010-Handle-partitioned-table-creation-on-the-apply-worke.patch	application/octet-stream	11.5 KB
0007-Enable-logging-and-replication-of-DDLs-executed-insi.patch	application/octet-stream	17.5 KB
0012-Change-the-logging-format-of-logical-ddlmessage-to-t.patch	application/octet-stream	36.9 KB
0011-Remove-non-transactional-ddl-message-decoding-becaus.patch	application/octet-stream	34.7 KB

From:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	RE: Support logical replication of DDLs
Date:	2022-06-13 10:08:16
Message-ID:	OS0PR01MB571684CBF660D05B63B4412C94AB9@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Sunday, June 12, 2022 2:46 PM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
>
> > > > > > I've not looked at these patches in-depth yet but with this
> > > > > > approach, what do you think we can handle the DDL syntax
> > > > > > differences between major versions? DDL syntax or behavior
> > > > > > could be changed by future changes and I think we need to
> > > > > > somehow deal with the differences. For
> > > > >
> > > > > > example, if the user uses logical replication for major
> > > > > > version upgrade, the publisher is older than the subscriber.
> > > > > > We might have to rewrite the DDL before applying to the
> > > > > > subscriber because the DDL executed on the publisher no longer
> > > > > > work on a new PostgreSQL version
> > > > >
> > > > > I don't think we will allow this kind of situation to happen in
> > > > > the first place for backward compatibility. If a DDL no longer
> > > > > works on a new version of PostgreSQL, the user will have to
> > > > > change the application code as well.
> > > > > So even if it happens for
> > > > > whatever reason, we could either 1. fail the apply worker and
> > > > > let the user fix such DDL because they'll have to fix the
> > > > > application code anyway when this happens.
> > > > > 2. add guard rail logic in the apply worker to automatically fix
> > > > > such DDL if possible, knowing the version of the source and
> > > > > target. Similar logic must have been implemented for
> pg_dump/restore/upgrade.
> > > > >
> > > > > > or we might have to add some options to the DDL before the
> > > > > > application in order to keep the same behavior. This seems to
> > > > > > require a different solution from what the patch does for the
> > > > > > problem you mentioned such
> > > > >
> > > > > > as "DDL involving multiple tables where only some tables are
> > > > > > replicated”.
> > > > >
> > > > > First of all, this case can only happen when the customer
> > > > > chooses to only replicate a subset of the tables in a database
> > > > > in which case table level DDL replication is chosen instead of
> > > > > database level DDL replication (where all tables and DDLs are
> > > > > replicated). I think the solution would be:
> > > > > 1. make best effort to detect such DDLs on the publisher and
> > > > > avoid logging of such DDLs in table level DDL replication.
> > > > > 2. apply worker will fail to replay such command due to missing
> > > > > objects if such DDLs didn't get filtered on the publisher for
> > > > > some reason. This should be rare and I think it's OK even if it
> > > > > happens, we'll find out why and fix it.
> > > > >
> > > >
> > > > FWIW, both these cases could be handled with the deparsing
> > > > approach, and the handling related to the drop of multiple tables
> > > > where only a few are published is already done in the last POC
> > > > patch shared by Ajin [1].
> > > >
> > >
> > > Right. So I'm inclined to think that deparsing approach is better
> > > from this point as well as the point mentioned by Álvaro before[1].
> >
> > I agree. One more point about deparsing approach is that it can also
> > help to replicate CREATE TABLE AS/SELECT INTO in a better way.
> >
> > The main idea of replicating the CREATE TABLE AS is that we deprase
> > the CREATE TABLE AS into a simple CREATE TABLE(without subquery)
> > command and WAL log it after creating the table and before writing
> > data into the table and replicate the incoming writes later as normal
> > INSERTs. In this apporach, we don't execute the subquery on subscriber
> > so that don't need to make sure all the objects referenced in the
> > subquery also exists in subscriber. And This approach works for all
> > kind of commands(e.g. CRAETE TABLE AS [SELECT][EXECUTE][VALUES])
> >
> > One problem of this approach is that we cannot use the current trigger
> > to deparse or WAL log the CREATE TABLE. Because none of the even
> > trigger is fired after creating the table and before inserting the
> > data. To solve this, one idea is that we could directly add some code
> > at the end of create_ctas_internal() to deparse and WAL log it.
> > Moreover, we could even introduce a new type of event
> > trigger(table_create) which would be fired at the expected timing so
> > that we can use the trigger function to deparse and WAL log. I am not
> > sure which way is better. I temporarily use the second idea which
> > introduce a new type event trigger in the 0003 POC patch.
>
> Hi, I agree that an intermediate structured format (with all objects schema
> qualified) makes it easier to handle syntax differences between the publisher
> and the subscriber. Such structured format is also likely easier to use by other
> logical replication consumers.
> However, to make things more maintainable, would it be better to use the
> existing serialization/deserialization functions in out/readfuncs.c to generate
> the parsetree representation of the DDL command?
>
> It turns out support for DDL commands are mostly missing in out/readfuncs at
> the moment. see the following comment from
> outfuncs.c:
>
> * Output support for raw parsetrees
> * is somewhat incomplete, too; in particular, utility statements are
> * almost entirely unsupported. We try to support everything that can
> * appear in a raw SELECT, though.
>
> So what about adding support for utility statements in out/readfuncs.c so that
> nodeToString()/stringToNode() works on raw parsetrees of utility statements? I
> think there'll be some benefits:
> 1. It's less code compared to introducing brand new
> serialization/deserialization code for DDL commands.
> 2. It's more maintainable since hackers are already familiar with
> out/readfuncs.c.
> 3. Support for utility statements in out/readfuncs.c may turn out to be useful
> for future features that need serialization/deserialization of DDL raw
> parsetrees.
>
> In patch 0012 I explored the above idea by changing the logging format of
> logical ddlmessage to that of the serialization of the raw parsetree of the DDL
> command using nodeToString(), the apply worker deserializes the message
> back to the raw parsetree by calling stringToNode().
> Here are the detailed changes:
> - Adding serialization/deserialization functions in outfuncs.c/readfuncs.c
> for CreateTableStmt, AlterTableStmt, DropStmt, CreateFunctionStmt and
> AlterFunctionStmt.
> - Modified the serialization process to always schema qualify object names,
> this is done by outQualifiedName() and a change in _outRangeVar().
> - Change the input of LogLogicalDDLMessage() to use
> nodeToString(parsetree).
> - Change the apply worker to call stringToNode(ddlmessage) to get the
> raw parsetree back and then directly execute the parsetree.
>
> This patch doesn't introduce any deparsing functionality yet, but it also
> provides the flexibility to edit the command being replayed by directly
> modifying the raw parsetree on the apply worker (e.g. setting the missing_ok
> flag for DropStmt is equivalent to adding "IF EXIST" to the statement).
>
> Thoughts?

Hi,

Thanks for providing this idea.

I looked at the string that is used for replication:

"""
{ALTERTABLESTMT :relation {RANGEVAR :schemaname public :relname foo
:inh true :relpersistence p :alias <> :location 12} :cmds ({ALTERTABLECMD
:subtype 0 :name <> :num 0 :newowner <> :def {COLUMNDEF :colname b
:typeName {TYPENAME :names ("public" "timestamptz") :typeOid 0 :setof
false :pct_type false :typmods <> :typemod -1 :arrayBounds <> :location
29} :compression <> :inhcount 0 :is_local true :is_not_null false
:is_from_type false :storage <> :raw_default <> :cooked_default <>
:identity <> :identitySequence <> :generated <> :collClause <> :collOid 0
:constraints <> :fdwoptions <> :location 27} :behavior 0 :missing_ok
false}) :objtype 41 :missing_ok false}
"""

I think the converted parsetree string includes lots of internal
objects(e.g typeOid/pct_type/objtype/collOid/location/...). These are
unnecessary stuff for replication and we cannot make sure all the internal
stuff are consistent among pub/sub. So I am not sure whether replicating
this string is better.

Besides, replicating the string from nodetostring() means we would need to
deal with the structure difference between the publisher and the
subscriber if any related structure has been changed which seems not good.

IMO, The advantages of the deparsing approach(as implemented in the POC
patch set[1]) are:

1) We can generate a command representation that can be
parsed/processed/transformed arbitrarily by the subscriber using generic
rules it(for example: user can easily replace the schema name in it) while
the results of nodetostring() seems not a standard json string, so I am
not sure can user reuse it without traversing the parsetree again.

2) With event_trigger + deparser, we can filter the unpublished objects
easier. For example: "DROP TABLE table_pub, table_unpub;". We can deparse
it into two commands "DROP TABLE table_pub" and "DROP TABLE table_pub" and
only publish the first one.

3) With deparser, we are able to query the catalog in the deparser to
build a complete command(filled with schemaname...) which user don't need
to do any other work for it. We don't need to force the subscriber to set
the same search_path as the publisher which give user more flexibility.

4) For CREATE TABLE AS, we can separate out the CREATE TABLE part with the
help of deparser and event trigger. This can avoid executing the subquery
on subcriber.

5) For ALTER TABLE command. We might want to filter out the DDL which use
volatile function as discussed in [2]. We can achieve this easier by
extending the deparser to check the functions used. We can even rebuild a
command without unsupported functions to replicate by using deparser.

There may be more cases I am missing as we are still analyzing other DDLs.

And most of these ideas has been implemented in the POC patch[1].

The overhead to maintain the deparser is not a problem as long as we have
a good test mechanism to verify that all DDLs are supported.

Overall, the event_trigger + deparsing approach looks better to me.

[1] https://www.postgresql.org/message-id/OS0PR01MB5716733D6C79D2198E5773CC94A79%40OS0PR01MB5716.jpnprd01.prod.outlook.com
[2] https://www.postgresql.org/message-id/CAA4eK1JVynFsj%2BmcRWj9sewR2yNUs6LuNxJ0eN-gNJ83oKcUOQ%40mail.gmail.com

Best regards,
Hou zj

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-06-15 00:14:14
Message-ID:	CAAD30U+oi6e6Vh_zAzhuXzkqUhagmLGD+_iyn2N9w_sNRKsoag@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

> Thanks for providing this idea.
>
> I looked at the string that is used for replication:
>
> """
> {ALTERTABLESTMT :relation {RANGEVAR :schemaname public :relname foo
> :inh true :relpersistence p :alias <> :location 12} :cmds ({ALTERTABLECMD
> :subtype 0 :name <> :num 0 :newowner <> :def {COLUMNDEF :colname b
> :typeName {TYPENAME :names ("public" "timestamptz") :typeOid 0 :setof
> false :pct_type false :typmods <> :typemod -1 :arrayBounds <> :location
> 29} :compression <> :inhcount 0 :is_local true :is_not_null false
> :is_from_type false :storage <> :raw_default <> :cooked_default <>
> :identity <> :identitySequence <> :generated <> :collClause <> :collOid 0
> :constraints <> :fdwoptions <> :location 27} :behavior 0 :missing_ok
> false}) :objtype 41 :missing_ok false}
> """
>
> I think the converted parsetree string includes lots of internal
> objects(e.g typeOid/pct_type/objtype/collOid/location/...). These are
> unnecessary stuff for replication and we cannot make sure all the internal
> stuff are consistent among pub/sub. So I am not sure whether replicating
> this string is better.
>
> Besides, replicating the string from nodetostring() means we would need to
> deal with the structure difference between the publisher and the
> subscriber if any related structure has been changed which seems not good.

Yeah, this existing format is not designed to be portable between different
major versions. So it can't directly be used for replication without
serious modification.

> IMO, The advantages of the deparsing approach(as implemented in the POC
> patch set[1]) are:
>
> 1) We can generate a command representation that can be
> parsed/processed/transformed arbitrarily by the subscriber using generic
> rules it(for example: user can easily replace the schema name in it) while
> the results of nodetostring() seems not a standard json string, so I am
> not sure can user reuse it without traversing the parsetree again.
>
> 2) With event_trigger + deparser, we can filter the unpublished objects
> easier. For example: "DROP TABLE table_pub, table_unpub;". We can deparse
> it into two commands "DROP TABLE table_pub" and "DROP TABLE table_pub" and
> only publish the first one.
>
> 3) With deparser, we are able to query the catalog in the deparser to
> build a complete command(filled with schemaname...) which user don't need
> to do any other work for it. We don't need to force the subscriber to set
> the same search_path as the publisher which give user more flexibility.
>
> 4) For CREATE TABLE AS, we can separate out the CREATE TABLE part with the
> help of deparser and event trigger. This can avoid executing the subquery
> on subcriber.
>
> 5) For ALTER TABLE command. We might want to filter out the DDL which use
> volatile function as discussed in [2]. We can achieve this easier by
> extending the deparser to check the functions used. We can even rebuild a
> command without unsupported functions to replicate by using deparser.
>
> There may be more cases I am missing as we are still analyzing other DDLs.

How does the deparser deparses CREATE FUNCTION STATEMENT? Will it schema qualify
objects inside the function definition?

While I agree that the deparser is needed to handle the potential
syntax differences between
the pub/sub, I think it's only relevant for the use cases where only a
subset of tables in the database
are replicated. For other use cases where all tables, functions and
other objects need to be replicated,
(for example, creating a logical replica for major version upgrade)
there won't be any syntax difference to
handle and the schemas are supposed to match exactly between the
pub/sub. In other words the user seeks to create an identical replica
of the source database and the DDLs should be replicated
as is in this case. So I think it's an overkill to use deparser for
such use cases. It also costs more space and
time using deparsing. For example, the following simple ALTER TABLE
command incurs 11 times more space
in the WAL record if we were to use the format from the deparser,
there will also be time and CPU overhead from the deparser.

ALTER TABLE t1 ADD c INT; serach_path: "$user", public
VS
{\"fmt\": \"ALTER TABLE %{identity}D %{subcmds:, }s\", \"subcmds\":
[{\"fmt\": \"ADD COLUMN %{definition}s\", \"type\": \"add column\",
\"definiti
on\": {\"fmt\": \"%{name}I %{coltype}T %{default}s %{not_null}s
%{collation}s\", \"name\": \"c\", \"type\": \"column\", \"coltype\":
{\"typmod\": \"\", \"typarray
\": false, \"typename\": \"int4\", \"schemaname\": \"pg_catalog\"},
\"default\": {\"fmt\": \"DEFAULT %{default}s\", \"present\": false},
\"not_null\": \"\", \"col
lation\": {\"fmt\": \"COLLATE %{name}D\", \"present\": false}}}],
\"identity\": {\"objname\": \"t1\", \"schemaname\": \"public\"}}

So I think it's better to define DDL replication levels [1] to tailor
for the two different use cases. We can use different logging format
based on the DDL replication level. For example,
we can simply log the DDL query string and the search_path for
database level DDL replication. But for table level DDL replication we
need to use the deparser format in order to
handle the potential syntax differences and schema mapping requests.

[1] https://www.postgresql.org/message-id/CAAD30U%2BpVmfKwUKy8cbZOnUXyguJ-uBNejwD75Kyo%3DOjdQGJ9g%40mail.gmail.com

Thoughts?

With Regards,
Zheng

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-06-15 04:00:31
Message-ID:	CAA4eK1JKK9LACPovjogS-LThQBscwkrxBy9RuA6aHFP=vTGjtg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Wed, Jun 15, 2022 at 5:44 AM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
>
>
> While I agree that the deparser is needed to handle the potential
> syntax differences between
> the pub/sub, I think it's only relevant for the use cases where only a
> subset of tables in the database
> are replicated. For other use cases where all tables, functions and
> other objects need to be replicated,
> (for example, creating a logical replica for major version upgrade)
> there won't be any syntax difference to
> handle and the schemas are supposed to match exactly between the
> pub/sub. In other words the user seeks to create an identical replica
> of the source database and the DDLs should be replicated
> as is in this case.
>

I think even for database-level replication we can't assume that
source and target will always have the same data in which case "Create
Table As ..", "Alter Table .. " kind of statements can't be replicated
as it is because that can lead to different results. The other point
is tomorrow we can extend the database level option/syntax to exclude
a few objects (something like [1]) as well in which case we again need
to filter at the publisher level.

>
So I think it's an overkill to use deparser for
> such use cases. It also costs more space and
> time using deparsing. For example, the following simple ALTER TABLE
> command incurs 11 times more space
> in the WAL record if we were to use the format from the deparser,
> there will also be time and CPU overhead from the deparser.
>
...
>
> So I think it's better to define DDL replication levels [1] to tailor
> for the two different use cases. We can use different logging format
> based on the DDL replication level. For example,
> we can simply log the DDL query string and the search_path for
> database level DDL replication. But for table level DDL replication we
> need to use the deparser format in order to
> handle the potential syntax differences and schema mapping requests.
>

I think having different logging formats is worth considering but I am
not sure we can distinguish it for database and table level
replication because of the reasons mentioned above. One thing which
may need a different format is the replication of global objects like
roles, tablespace, etc. but we haven't analyzed them in detail about
those. I feel we may also need a different syntax altogether to
replicate such objects. Also, I think we may want to optimize the
current format in some cases so that the WAL amount could be reduced.

I feel if we think that deparsing is required for this project then
probably at this stage it would be a good idea to explore ways to have
independent ways to test it. One way is to do testing via the logical
replication of DDL (aka via this patch) and the other is to write an
independent test suite as Sawada-San seems to be speculating above
[2]. I am not sure if there is any progress yet on the independent
test suite front yet.

[1] - https://commitfest.postgresql.org/38/3646/
[2] - https://www.postgresql.org/message-id/CAD21AoAX_xiO03hXSY2QfbcKT0XiUvtnzTjy%2BNRJ_EcgBa5B3A%40mail.gmail.com

--
With Regards,
Amit Kapila.

From:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	RE: Support logical replication of DDLs
Date:	2022-06-15 06:45:37
Message-ID:	OS0PR01MB571688CA7CAAE8B01089311294AD9@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Wednesday, June 15, 2022 8:14 AM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
>
> > Thanks for providing this idea.
> >
> > I looked at the string that is used for replication:
> >
> > """
> > {ALTERTABLESTMT :relation {RANGEVAR :schemaname public :relname foo
> > :inh true :relpersistence p :alias <> :location 12} :cmds ({ALTERTABLECMD
> > :subtype 0 :name <> :num 0 :newowner <> :def {COLUMNDEF :colname b
> > :typeName {TYPENAME :names ("public" "timestamptz") :typeOid 0 :setof
> > false :pct_type false :typmods <> :typemod -1 :arrayBounds <> :location
> > 29} :compression <> :inhcount 0 :is_local true :is_not_null false
> > :is_from_type false :storage <> :raw_default <> :cooked_default <>
> > :identity <> :identitySequence <> :generated <> :collClause <> :collOid 0
> > :constraints <> :fdwoptions <> :location 27} :behavior 0 :missing_ok
> > false}) :objtype 41 :missing_ok false}
> > """
> >
> > I think the converted parsetree string includes lots of internal
> > objects(e.g typeOid/pct_type/objtype/collOid/location/...). These are
> > unnecessary stuff for replication and we cannot make sure all the internal
> > stuff are consistent among pub/sub. So I am not sure whether replicating
> > this string is better.
> >
> > Besides, replicating the string from nodetostring() means we would need to
> > deal with the structure difference between the publisher and the
> > subscriber if any related structure has been changed which seems not good.
>
> Yeah, this existing format is not designed to be portable between different
> major versions. So it can't directly be used for replication without
> serious modification.
>
> > IMO, The advantages of the deparsing approach(as implemented in the POC
> > patch set[1]) are:
> >
> > 1) We can generate a command representation that can be
> > parsed/processed/transformed arbitrarily by the subscriber using generic
> > rules it(for example: user can easily replace the schema name in it) while
> > the results of nodetostring() seems not a standard json string, so I am
> > not sure can user reuse it without traversing the parsetree again.
> >
> > 2) With event_trigger + deparser, we can filter the unpublished objects
> > easier. For example: "DROP TABLE table_pub, table_unpub;". We can deparse
> > it into two commands "DROP TABLE table_pub" and "DROP TABLE
> table_pub" and
> > only publish the first one.
> >
> > 3) With deparser, we are able to query the catalog in the deparser to
> > build a complete command(filled with schemaname...) which user don't need
> > to do any other work for it. We don't need to force the subscriber to set
> > the same search_path as the publisher which give user more flexibility.
> >
> > 4) For CREATE TABLE AS, we can separate out the CREATE TABLE part with the
> > help of deparser and event trigger. This can avoid executing the subquery
> > on subcriber.
> >
> > 5) For ALTER TABLE command. We might want to filter out the DDL which use
> > volatile function as discussed in [2]. We can achieve this easier by
> > extending the deparser to check the functions used. We can even rebuild a
> > command without unsupported functions to replicate by using deparser.
> >
> > There may be more cases I am missing as we are still analyzing other DDLs.
>
> How does the deparser deparses CREATE FUNCTION STATEMENT? Will it
> schema qualify
> objects inside the function definition?

The current deparser doesn't schema qualify objects inside the function
source as we won't know the schema of inner objects until the function is
executed. The deparser will only schema qualify the objects around
function declaration Like:

CREATE FUNCTION [public].test_func(i [pg_catalog].int4 ) RETURNS [pg_catalog].int4 LANGUAGE plpgsql

Best regards,
Hou zj

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-06-17 19:38:03
Message-ID:	CAAD30U+wDPDFzUoPkSg2WYMNCXWNc8wa7GYB1Tzh_2PNUBsEHA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Wed, Jun 15, 2022 at 12:00 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Jun 15, 2022 at 5:44 AM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
> >
> >
> > While I agree that the deparser is needed to handle the potential
> > syntax differences between
> > the pub/sub, I think it's only relevant for the use cases where only a
> > subset of tables in the database
> > are replicated. For other use cases where all tables, functions and
> > other objects need to be replicated,
> > (for example, creating a logical replica for major version upgrade)
> > there won't be any syntax difference to
> > handle and the schemas are supposed to match exactly between the
> > pub/sub. In other words the user seeks to create an identical replica
> > of the source database and the DDLs should be replicated
> > as is in this case.
> >
>
> I think even for database-level replication we can't assume that
> source and target will always have the same data in which case "Create
> Table As ..", "Alter Table .. " kind of statements can't be replicated
> as it is because that can lead to different results.

"Create Table As .." is already handled by setting the skipData flag of
the statement parsetreee before replay:

/*
* Force skipping data population to avoid data inconsistency.
* Data should be replicated from the publisher instead.
*/
castmt->into->skipData = true;

"Alter Table .. " that rewrites with volatile expressions can also be handled
without any syntax change, by enabling the table rewrite replication and
converting the rewrite inserts to updates. ZJ's patch introduced this solution.
I've also adopted this approach in my latest patch
0012-Support-replication-of-ALTER-TABLE-commands-that-rew.patch

> The other point
> is tomorrow we can extend the database level option/syntax to exclude
> a few objects (something like [1]) as well in which case we again need
> to filter at the publisher level

I think for such cases it's not full database replication and we could treat it
as table level DDL replication, i.e. use the the deparser format.

> > So I think it's better to define DDL replication levels [1] to tailor
> > for the two different use cases. We can use different logging format
> > based on the DDL replication level. For example,
> > we can simply log the DDL query string and the search_path for
> > database level DDL replication. But for table level DDL replication we
> > need to use the deparser format in order to
> > handle the potential syntax differences and schema mapping requests.
> >
>
> I think having different logging formats is worth considering but I am
> not sure we can distinguish it for database and table level
> replication because of the reasons mentioned above. One thing which
> may need a different format is the replication of global objects like
> roles, tablespace, etc. but we haven't analyzed them in detail about
> those. I feel we may also need a different syntax altogether to
> replicate such objects.

Yes, global objects are not schema qualified so we probably don't need to
use the deparser format for these. We plan to do some evaluation on replication
of global objects.

> Also, I think we may want to optimize the
> current format in some cases so that the WAL amount could be reduced.
>
> I feel if we think that deparsing is required for this project then
> probably at this stage it would be a good idea to explore ways to have
> independent ways to test it. One way is to do testing via the logical
> replication of DDL (aka via this patch) and the other is to write an
> independent test suite as Sawada-San seems to be speculating above
> [2]. I am not sure if there is any progress yet on the independent
> test suite front yet.

Testing DDL deparsing support has been discussed before in [1], we
will also take a close look on it.

[1] https://www.postgresql.org/message-id/5477FD78.1060306%402ndquadrant.com

Regards,
Zheng

Attachment	Content-Type	Size
0003-Logical-replication-of-DDL-messages.patch	application/octet-stream	41.2 KB
0005-Support-replication-of-CREATE-.-AS-.-and-SELECT-.-IN.patch	application/octet-stream	8.0 KB
0002-Support-logical-logging-and-decoding-of-DDL-command-.patch	application/octet-stream	61.4 KB
0001-Define-DDL-replication-levels-via-the-CREATE-PUBLICA.patch	application/octet-stream	65.0 KB
0004-Enable-replication-of-CREATE-MATERIALIZED-VIEW-AS-st.patch	application/octet-stream	4.1 KB
0008-Fail-replication-worker-on-DDL-command-that-rewrites.patch	application/octet-stream	4.8 KB
0009-Support-replication-of-DDL-type-T_RenameStmt-table-r.patch	application/octet-stream	8.3 KB
0007-Enable-logging-and-replication-of-DDLs-executed-insi.patch	application/octet-stream	17.5 KB
0010-Handle-partitioned-table-creation-on-the-apply-worke.patch	application/octet-stream	11.5 KB
0006-Add-couple-tests-for-DDL-replication-case.patch	application/octet-stream	4.5 KB
0012-Support-replication-of-ALTER-TABLE-commands-that-rew.patch	application/octet-stream	7.3 KB
0011-Remove-non-transactional-ddl-message-decoding-becaus.patch	application/octet-stream	46.9 KB

From:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	RE: Support logical replication of DDLs
Date:	2022-06-20 03:31:37
Message-ID:	OS0PR01MB5716DBA31AC57163D2CF465E94B09@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Saturday, June 18, 2022 3:38 AM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
> On Wed, Jun 15, 2022 at 12:00 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> >
> > On Wed, Jun 15, 2022 at 5:44 AM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
> > >
> > >
> > > While I agree that the deparser is needed to handle the potential
> > > syntax differences between the pub/sub, I think it's only relevant
> > > for the use cases where only a subset of tables in the database are
> > > replicated. For other use cases where all tables, functions and
> > > other objects need to be replicated, (for example, creating a
> > > logical replica for major version upgrade) there won't be any syntax
> > > difference to handle and the schemas are supposed to match exactly
> > > between the pub/sub. In other words the user seeks to create an
> > > identical replica of the source database and the DDLs should be
> > > replicated as is in this case.
> > >
> >
> > I think even for database-level replication we can't assume that
> > source and target will always have the same data in which case "Create
> > Table As ..", "Alter Table .. " kind of statements can't be replicated
> > as it is because that can lead to different results.
> "Create Table As .." is already handled by setting the skipData flag of the
> statement parsetreee before replay:
>
> /*
> * Force skipping data population to avoid data inconsistency.
> * Data should be replicated from the publisher instead.
> */
> castmt->into->skipData = true;
>
> "Alter Table .. " that rewrites with volatile expressions can also be handled
> without any syntax change, by enabling the table rewrite replication and
> converting the rewrite inserts to updates. ZJ's patch introduced this solution.
> I've also adopted this approach in my latest patch
> 0012-Support-replication-of-ALTER-TABLE-commands-that-rew.patch
>
> > The other point
> > is tomorrow we can extend the database level option/syntax to exclude
> > a few objects (something like [1]) as well in which case we again need
> > to filter at the publisher level
>
> I think for such cases it's not full database replication and we could treat it as
> table level DDL replication, i.e. use the the deparser format.

Hi,

Here are some points in my mind about the two approaches discussed here.

1) search_patch vs schema qualify

Again, I still think it will bring more flexibility and security by schema qualify the
objects in DDL command as mentioned before[1].

Besides, a schema qualified DDL is also more appropriate for other use
cases(e.g. a table-level replication). As it's possible the schema is different
between pub/sub and it's easy to cause unexpected and undetectable failure if
we just log the search_path.

It makes more sense to me to have the same style WAL log(schema qualified) for
both database level or table level replication as it will bring more
flexibility.

> "Create Table As .." is already handled by setting the skipData flag of the
> statement parsetreee before replay:

2) About the handling of CREATE TABLE AS:

I think it's not a appropriate approach to set the skipdata flag on subscriber
as it cannot handle EXECUTE command in CTAS.

CREATE TABLE q5_prep_results AS EXECUTE q5(200, 'DTAAAA');

The Prepared statement is a temporary object which we don't replicate. So if
you directly execute the original SQL on subscriber, even if you set skipdata
it will fail.

I think it difficult to make this work as you need handle the create/drop of
this prepared statement. And even if we extended subscriber's code to make it
work, it doesn't seems like a standard and elegant approach.

> "Alter Table .. " that rewrites with volatile expressions can also be handled
> without any syntax change, by enabling the table rewrite replication and
> converting the rewrite inserts to updates. ZJ's patch introduced this solution.

3) About the handling of ALTER TABLE rewrite.

The approach I proposed before is based on the event trigger + deparser
approach. We were able to improve that approach as we don't need to replicate
the rewrite in many cases. For example: we don't need to replicate rewrite dml
if there is no volatile/mutable function. We should check and filter these case
at publisher (e.g. via deparser) instead of checking that at subscriber.

Besides, as discussed, we need to give warning or error for the cases when DDL
contains volatile function which would be executed[2]. We should check this at
publisher as well(via deparser).

> I think for such cases it's not full database replication and we could treat it as
> table level DDL replication, i.e. use the the deparser format.

4) I think the point could be that we should make the WAL log format extendable
so that we can extend it to support more useful feature(table filter/schema
maps/DDL filter). If we just WAL log the original SQL, it seems it's difficult
to extend it in the future ?

[1] https://www.postgresql.org/message-id/202204081134.6tcmf5cxl3sz%40alvherre.pgsql
[2] https://www.postgresql.org/message-id/CAA4eK1JVynFsj%2BmcRWj9sewR2yNUs6LuNxJ0eN-gNJ83oKcUOQ%40mail.gmail.com

Best regards,
Hou zj

From:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>
Subject:	RE: Support logical replication of DDLs
Date:	2022-06-21 12:19:15
Message-ID:	OS0PR01MB5716B1526C2EDA66907E733B94B39@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Monday, June 20, 2022 11:32 AM houzj(dot)fnst(at)fujitsu(dot)com <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Saturday, June 18, 2022 3:38 AM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
> > On Wed, Jun 15, 2022 at 12:00 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> > wrote:
> > >
> > > On Wed, Jun 15, 2022 at 5:44 AM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
> > > >
> > > >
> > > > While I agree that the deparser is needed to handle the potential
> > > > syntax differences between the pub/sub, I think it's only relevant
> > > > for the use cases where only a subset of tables in the database are
> > > > replicated. For other use cases where all tables, functions and
> > > > other objects need to be replicated, (for example, creating a
> > > > logical replica for major version upgrade) there won't be any syntax
> > > > difference to handle and the schemas are supposed to match exactly
> > > > between the pub/sub. In other words the user seeks to create an
> > > > identical replica of the source database and the DDLs should be
> > > > replicated as is in this case.
> > > >
> > >
> > > I think even for database-level replication we can't assume that
> > > source and target will always have the same data in which case "Create
> > > Table As ..", "Alter Table .. " kind of statements can't be replicated
> > > as it is because that can lead to different results.
> > "Create Table As .." is already handled by setting the skipData flag of the
> > statement parsetreee before replay:
> >
> > /*
> > * Force skipping data population to avoid data inconsistency.
> > * Data should be replicated from the publisher instead.
> > */
> > castmt->into->skipData = true;
> >
> > "Alter Table .. " that rewrites with volatile expressions can also be handled
> > without any syntax change, by enabling the table rewrite replication and
> > converting the rewrite inserts to updates. ZJ's patch introduced this solution.
> > I've also adopted this approach in my latest patch
> > 0012-Support-replication-of-ALTER-TABLE-commands-that-rew.patch
> >
> > > The other point
> > > is tomorrow we can extend the database level option/syntax to exclude
> > > a few objects (something like [1]) as well in which case we again need
> > > to filter at the publisher level
> >
> > I think for such cases it's not full database replication and we could treat it as
> > table level DDL replication, i.e. use the the deparser format.
>
> Hi,
>
> Here are some points in my mind about the two approaches discussed here.
>
> 1) search_patch vs schema qualify
>
> Again, I still think it will bring more flexibility and security by schema qualify the
> objects in DDL command as mentioned before[1].
>
> Besides, a schema qualified DDL is also more appropriate for other use
> cases(e.g. a table-level replication). As it's possible the schema is different
> between pub/sub and it's easy to cause unexpected and undetectable failure if
> we just log the search_path.
>
> It makes more sense to me to have the same style WAL log(schema qualified)
> for
> both database level or table level replication as it will bring more
> flexibility.
>
>
> > "Create Table As .." is already handled by setting the skipData flag of the
> > statement parsetreee before replay:
>
> 2) About the handling of CREATE TABLE AS:
>
> I think it's not a appropriate approach to set the skipdata flag on subscriber
> as it cannot handle EXECUTE command in CTAS.
>
> CREATE TABLE q5_prep_results AS EXECUTE q5(200, 'DTAAAA');
>
> The Prepared statement is a temporary object which we don't replicate. So if
> you directly execute the original SQL on subscriber, even if you set skipdata
> it will fail.
>
> I think it difficult to make this work as you need handle the create/drop of
> this prepared statement. And even if we extended subscriber's code to make it
> work, it doesn't seems like a standard and elegant approach.
>
>
> > "Alter Table .. " that rewrites with volatile expressions can also be handled
> > without any syntax change, by enabling the table rewrite replication and
> > converting the rewrite inserts to updates. ZJ's patch introduced this solution.
>
> 3) About the handling of ALTER TABLE rewrite.
>
> The approach I proposed before is based on the event trigger + deparser
> approach. We were able to improve that approach as we don't need to
> replicate
> the rewrite in many cases. For example: we don't need to replicate rewrite dml
> if there is no volatile/mutable function. We should check and filter these case
> at publisher (e.g. via deparser) instead of checking that at subscriber.
>
> Besides, as discussed, we need to give warning or error for the cases when DDL
> contains volatile function which would be executed[2]. We should check this at
> publisher as well(via deparser).
>
>
> > I think for such cases it's not full database replication and we could treat it as
> > table level DDL replication, i.e. use the the deparser format.
>
> 4) I think the point could be that we should make the WAL log format
> extendable
> so that we can extend it to support more useful feature(table filter/schema
> maps/DDL filter). If we just WAL log the original SQL, it seems it's difficult
> to extend it in the future ?

Attach the new version patch set which added support for CREATE/DROP/ATER
Sequence and CREATE/DROP Schema ddl commands which are provided by Ajin
Cherian off list.

The new version patch will also check function's volatility[1] in ALTER TABLE
command. If any function to be executed is volatile, we report an ERROR.
Whether WARNING is better to be used here is still under consideration.

[1] https://www.postgresql.org/message-id/CAA4eK1JVynFsj%2BmcRWj9sewR2yNUs6LuNxJ0eN-gNJ83oKcUOQ%40mail.gmail.com

Best regards,
Hou zj

Attachment	Content-Type	Size
v9-0002-Support-DDL-replication.patch	application/octet-stream	123.7 KB
v9-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	103.0 KB
v9-0003-support-CREATE-TABLE-ASSELECT-INTO.patch	application/octet-stream	14.1 KB
v9-0004-Test-cases-for-DDL-replication.patch	application/octet-stream	22.5 KB

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-06-22 05:38:36
Message-ID:	CAD21AoBVCoPPRKvU_5-=wEXsa92GsNJFJOcYyXzvoSEJCx5dKw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Wed, Jun 15, 2022 at 1:00 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Jun 15, 2022 at 5:44 AM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
> >
> >
> > While I agree that the deparser is needed to handle the potential
> > syntax differences between
> > the pub/sub, I think it's only relevant for the use cases where only a
> > subset of tables in the database
> > are replicated. For other use cases where all tables, functions and
> > other objects need to be replicated,
> > (for example, creating a logical replica for major version upgrade)
> > there won't be any syntax difference to
> > handle and the schemas are supposed to match exactly between the
> > pub/sub. In other words the user seeks to create an identical replica
> > of the source database and the DDLs should be replicated
> > as is in this case.
> >
>
> I think even for database-level replication we can't assume that
> source and target will always have the same data in which case "Create
> Table As ..", "Alter Table .. " kind of statements can't be replicated
> as it is because that can lead to different results. The other point
> is tomorrow we can extend the database level option/syntax to exclude
> a few objects (something like [1]) as well in which case we again need
> to filter at the publisher level.

Good point.

Regarding the idea of using the parse-tree representation produced by
nodeToString(), I’ve not read the patch yet but I'm not sure it's a
good idea. A field name of a node could be changed in a major version.
If a publisher sends a parse-tree string representation to a newer
major version subscriber, the subscriber needs to be able to parse the
old format parse-tree string representation in order to reconstruct
the DDL, which reduces the maintainability much. On the other hand,
the format of deparsed json string would not be changed often.

>
> >
> So I think it's an overkill to use deparser for
> > such use cases. It also costs more space and
> > time using deparsing. For example, the following simple ALTER TABLE
> > command incurs 11 times more space
> > in the WAL record if we were to use the format from the deparser,
> > there will also be time and CPU overhead from the deparser.
> >
> ...
> >
> > So I think it's better to define DDL replication levels [1] to tailor
> > for the two different use cases. We can use different logging format
> > based on the DDL replication level. For example,
> > we can simply log the DDL query string and the search_path for
> > database level DDL replication. But for table level DDL replication we
> > need to use the deparser format in order to
> > handle the potential syntax differences and schema mapping requests.
> >
>
> I think having different logging formats is worth considering but I am
> not sure we can distinguish it for database and table level
> replication because of the reasons mentioned above. One thing which
> may need a different format is the replication of global objects like
> roles, tablespace, etc. but we haven't analyzed them in detail about
> those. I feel we may also need a different syntax altogether to
> replicate such objects. Also, I think we may want to optimize the
> current format in some cases so that the WAL amount could be reduced.
>
> I feel if we think that deparsing is required for this project then
> probably at this stage it would be a good idea to explore ways to have
> independent ways to test it. One way is to do testing via the logical
> replication of DDL (aka via this patch) and the other is to write an
> independent test suite as Sawada-San seems to be speculating above
> [2]. I am not sure if there is any progress yet on the independent
> test suite front yet.

I've attached a WIP patch for adding regression tests for DDL deparse.
The patch can be applied on
v9-0001-Functions-to-deparse-DDL-commands.patch Hou recently
submitted[1]. The basic idea is to define the event trigger to deparse
DDLs, run the regression tests, load the deparsed DDLs to another
database cluster, dump both databases, and compare the dumps. Since
the patch doesn't support deparsing all DDLs and there is a bug[2],
the attached regression test does CREATE TABLE and some ALTER TABLE
instead of running regression tests.

Regards,

[1] https://www.postgresql.org/message-id/OS0PR01MB5716B1526C2EDA66907E733B94B39%40OS0PR01MB5716.jpnprd01.prod.outlook.com
[2] deparsing "ALTER INDEX tbl_idx ALTER COLUMN 2 SET STATISTICS
1000;" causes an assertion failure.

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachment	Content-Type	Size
0001-WIP-Add-regression-tests-for-DDL-deparse.patch	application/octet-stream	4.9 KB

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-06-22 09:07:44
Message-ID:	CALDaNm33W35pcBE3zOpJhwnYBdBoZDpKxssemAN21NwVhJuong@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Tue, Jun 21, 2022 at 5:49 PM houzj(dot)fnst(at)fujitsu(dot)com
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Monday, June 20, 2022 11:32 AM houzj(dot)fnst(at)fujitsu(dot)com <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > On Saturday, June 18, 2022 3:38 AM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
> > > On Wed, Jun 15, 2022 at 12:00 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> > > wrote:
> > > >
> > > > On Wed, Jun 15, 2022 at 5:44 AM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
> > > > >
> > > > >
> > > > > While I agree that the deparser is needed to handle the potential
> > > > > syntax differences between the pub/sub, I think it's only relevant
> > > > > for the use cases where only a subset of tables in the database are
> > > > > replicated. For other use cases where all tables, functions and
> > > > > other objects need to be replicated, (for example, creating a
> > > > > logical replica for major version upgrade) there won't be any syntax
> > > > > difference to handle and the schemas are supposed to match exactly
> > > > > between the pub/sub. In other words the user seeks to create an
> > > > > identical replica of the source database and the DDLs should be
> > > > > replicated as is in this case.
> > > > >
> > > >
> > > > I think even for database-level replication we can't assume that
> > > > source and target will always have the same data in which case "Create
> > > > Table As ..", "Alter Table .. " kind of statements can't be replicated
> > > > as it is because that can lead to different results.
> > > "Create Table As .." is already handled by setting the skipData flag of the
> > > statement parsetreee before replay:
> > >
> > > /*
> > > * Force skipping data population to avoid data inconsistency.
> > > * Data should be replicated from the publisher instead.
> > > */
> > > castmt->into->skipData = true;
> > >
> > > "Alter Table .. " that rewrites with volatile expressions can also be handled
> > > without any syntax change, by enabling the table rewrite replication and
> > > converting the rewrite inserts to updates. ZJ's patch introduced this solution.
> > > I've also adopted this approach in my latest patch
> > > 0012-Support-replication-of-ALTER-TABLE-commands-that-rew.patch
> > >
> > > > The other point
> > > > is tomorrow we can extend the database level option/syntax to exclude
> > > > a few objects (something like [1]) as well in which case we again need
> > > > to filter at the publisher level
> > >
> > > I think for such cases it's not full database replication and we could treat it as
> > > table level DDL replication, i.e. use the the deparser format.
> >
> > Hi,
> >
> > Here are some points in my mind about the two approaches discussed here.
> >
> > 1) search_patch vs schema qualify
> >
> > Again, I still think it will bring more flexibility and security by schema qualify the
> > objects in DDL command as mentioned before[1].
> >
> > Besides, a schema qualified DDL is also more appropriate for other use
> > cases(e.g. a table-level replication). As it's possible the schema is different
> > between pub/sub and it's easy to cause unexpected and undetectable failure if
> > we just log the search_path.
> >
> > It makes more sense to me to have the same style WAL log(schema qualified)
> > for
> > both database level or table level replication as it will bring more
> > flexibility.
> >
> >
> > > "Create Table As .." is already handled by setting the skipData flag of the
> > > statement parsetreee before replay:
> >
> > 2) About the handling of CREATE TABLE AS:
> >
> > I think it's not a appropriate approach to set the skipdata flag on subscriber
> > as it cannot handle EXECUTE command in CTAS.
> >
> > CREATE TABLE q5_prep_results AS EXECUTE q5(200, 'DTAAAA');
> >
> > The Prepared statement is a temporary object which we don't replicate. So if
> > you directly execute the original SQL on subscriber, even if you set skipdata
> > it will fail.
> >
> > I think it difficult to make this work as you need handle the create/drop of
> > this prepared statement. And even if we extended subscriber's code to make it
> > work, it doesn't seems like a standard and elegant approach.
> >
> >
> > > "Alter Table .. " that rewrites with volatile expressions can also be handled
> > > without any syntax change, by enabling the table rewrite replication and
> > > converting the rewrite inserts to updates. ZJ's patch introduced this solution.
> >
> > 3) About the handling of ALTER TABLE rewrite.
> >
> > The approach I proposed before is based on the event trigger + deparser
> > approach. We were able to improve that approach as we don't need to
> > replicate
> > the rewrite in many cases. For example: we don't need to replicate rewrite dml
> > if there is no volatile/mutable function. We should check and filter these case
> > at publisher (e.g. via deparser) instead of checking that at subscriber.
> >
> > Besides, as discussed, we need to give warning or error for the cases when DDL
> > contains volatile function which would be executed[2]. We should check this at
> > publisher as well(via deparser).
> >
> >
> > > I think for such cases it's not full database replication and we could treat it as
> > > table level DDL replication, i.e. use the the deparser format.
> >
> > 4) I think the point could be that we should make the WAL log format
> > extendable
> > so that we can extend it to support more useful feature(table filter/schema
> > maps/DDL filter). If we just WAL log the original SQL, it seems it's difficult
> > to extend it in the future ?
>
> Attach the new version patch set which added support for CREATE/DROP/ATER
> Sequence and CREATE/DROP Schema ddl commands which are provided by Ajin
> Cherian off list.
>
> The new version patch will also check function's volatility[1] in ALTER TABLE
> command. If any function to be executed is volatile, we report an ERROR.
> Whether WARNING is better to be used here is still under consideration.
>
> [1] https://www.postgresql.org/message-id/CAA4eK1JVynFsj%2BmcRWj9sewR2yNUs6LuNxJ0eN-gNJ83oKcUOQ%40mail.gmail.com

I noticed a couple of things while trying to apply the patch and
review the patch:
1) Creation of temporary table fails infinitely in the subscriber.
CREATE TEMPORARY TABLE temp1 (a int primary key);

The above statement is converted to the below format:
CREATE TEMPORARY TABLE pg_temp.temp1 (a pg_catalog.int4 ,
CONSTRAINT temp1_pkey PRIMARY KEY (a));
While handling the creation of temporary table in the worker, the
worker fails continuously with the following error:
2022-06-22 14:24:01.317 IST [240872] ERROR: schema "pg_temp" does not exist
2022-06-22 14:24:01.317 IST [240872] CONTEXT: processing remote data
for replication origin "pg_16384" during "DDL" in transaction 725
finished at 0/14BBDA8

This error comes from handle_create_table->get_namespace_oid. Here it
checks that the pg_temp namespace is present in the system or not. As
pg_temp namespace is not present it will throw an error.
I saw one issue regarding the partition table mentioned in the commit
message. We should include this in the commit message till this issue
is resolved.

2) There are few whitespace errors while applying patch
git am v9-0001-Functions-to-deparse-DDL-commands.patch
Applying: Functions to deparse DDL commands.
.git/rebase-apply/patch:1480: indent with spaces.
ObjTree *tmp;
.git/rebase-apply/patch:1481: indent with spaces.
char *tmpstr;
.git/rebase-apply/patch:1483: indent with spaces.
tmpstr = psprintf(INT64_FORMAT, seqdata->seqcache);
.git/rebase-apply/patch:1484: indent with spaces.
tmp = new_objtree_VA("CACHE %{value}s",
.git/rebase-apply/patch:1485: indent with spaces.
2,
warning: squelched 140 whitespace errors
warning: 145 lines add whitespace errors.

Regards,
Vignesh

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-06-22 22:22:00
Message-ID:	CAAD30U+76WKKuMzNWt6uS48P7yUku+feBgtoFw9juYGYBtEweQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

> Here are some points in my mind about the two approaches discussed here.
>
> 1) search_patch vs schema qualify
>
> Again, I still think it will bring more flexibility and security by schema qualify the
> objects in DDL command as mentioned before[1].

I wonder what security concerns you have? We certainly don't want to
log the search_path
if there are serious security issues.

> Besides, a schema qualified DDL is also more appropriate for other use
> cases(e.g. a table-level replication). As it's possible the schema is different
> between pub/sub and it's easy to cause unexpected and undetectable failure if
> we just log the search_path.
>
> It makes more sense to me to have the same style WAL log(schema qualified) for
> both database level or table level replication as it will bring more
> flexibility.

I think it's reasonable to consider using different formats for the
two different use cases.
Especially if the space and time overhead of the deparser format
sticks out. I also don't
think we need to use the deparser for global objects DDL such as ROLE
statements because
no schema qualification is needed. Also another issue with ROLE
statements is that they
are not captured by event triggers currently.

> > "Create Table As .." is already handled by setting the skipData flag of the
> > statement parsetreee before replay:
>
> 2) About the handling of CREATE TABLE AS:
>
> I think it's not a appropriate approach to set the skipdata flag on subscriber
> as it cannot handle EXECUTE command in CTAS.
>
> CREATE TABLE q5_prep_results AS EXECUTE q5(200, 'DTAAAA');
>
> The Prepared statement is a temporary object which we don't replicate. So if
> you directly execute the original SQL on subscriber, even if you set skipdata
> it will fail.
>
> I think it difficult to make this work as you need handle the create/drop of
> this prepared statement. And even if we extended subscriber's code to make it
> work, it doesn't seems like a standard and elegant approach.

This is indeed an interesting case, thanks for pointing this out. One
light weight solution
I can think of is to directly deparse the parsetree on the publisher
into a simple CREATE TABLE statement
without the prepared statement and then replicate the simple CREATE
TABLE statement .
This doesn't have to involve the json format though.

> > "Alter Table .. " that rewrites with volatile expressions can also be handled
> > without any syntax change, by enabling the table rewrite replication and
> > converting the rewrite inserts to updates. ZJ's patch introduced this solution.
>
> 3) About the handling of ALTER TABLE rewrite.
>
> The approach I proposed before is based on the event trigger + deparser
> approach. We were able to improve that approach as we don't need to replicate
> the rewrite in many cases. For example: we don't need to replicate rewrite dml
> if there is no volatile/mutable function. We should check and filter these case
> at publisher (e.g. via deparser) instead of checking that at subscriber.

Surely we can make the check about volatile/mutable functions on the
publisher side
as well. It doesn't have to be done via the deparser.

> Besides, as discussed, we need to give warning or error for the cases when DDL
> contains volatile function which would be executed[2]. We should check this at
> publisher as well(via deparser).

Again, I think the check doesn't have to be done via the deparser.

> > I think for such cases it's not full database replication and we could treat it as
> > table level DDL replication, i.e. use the the deparser format.
>
> 4) I think the point could be that we should make the WAL log format extendable
> so that we can extend it to support more useful feature(table filter/schema
> maps/DDL filter). If we just WAL log the original SQL, it seems it's difficult
> to extend it in the future ?

My point is that for full replication/version upgrade use cases we
don't need to worry about extending
it for features such as schema mapping. Because such use cases
naturally want to keep identical schema
structures.

With Regards,
Zheng

From:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	RE: Support logical replication of DDLs
Date:	2022-06-23 07:09:11
Message-ID:	OS0PR01MB5716768A1AE79CC77836EA3594B59@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Thursday, June 23, 2022 6:22 AM Zheng Li <zhengli10(at)gmail(dot)com> wrote:

Hi,

>
> > Here are some points in my mind about the two approaches discussed here.
> >
> > 1) search_patch vs schema qualify
> >
> > Again, I still think it will bring more flexibility and security by
> > schema qualify the objects in DDL command as mentioned before[1].
>
> I wonder what security concerns you have? We certainly don't want to log the
> search_path if there are serious security issues.

I was thinking the case when the publisher has two schema "s1, s2" while
subscriber only has schema "s2". If we set publisher's search_patch to
's1, s2' and execute CREATE TABLE xxx (); If we replicate the original SQL
with search_path to subcriber, it would silently create the table on
schema s2 instead of reporting an error "schema s1 doesn't exist" which
looks dangerous to me.

>
> > > "Create Table As .." is already handled by setting the skipData flag
> > > of the statement parsetreee before replay:
> >
> > 2) About the handling of CREATE TABLE AS:
> >
> > I think it's not a appropriate approach to set the skipdata flag on
> > subscriber as it cannot handle EXECUTE command in CTAS.
> >
> > CREATE TABLE q5_prep_results AS EXECUTE q5(200, 'DTAAAA');
> >
> > The Prepared statement is a temporary object which we don't replicate.
> > So if you directly execute the original SQL on subscriber, even if you
> > set skipdata it will fail.
> >
> > I think it difficult to make this work as you need handle the
> > create/drop of this prepared statement. And even if we extended
> > subscriber's code to make it work, it doesn't seems like a standard and
> elegant approach.
>
> This is indeed an interesting case, thanks for pointing this out. One light weight
> solution I can think of is to directly deparse the parsetree on the publisher into
> a simple CREATE TABLE statement without the prepared statement and then
> replicate the simple CREATE TABLE statement .
> This doesn't have to involve the json format though.

I thought about this solution as well. But I am not very sure about this,
I feel it looks a bit hacky to directly do this instead of using a standard
event trigger(Or introduce a new type event trigger).

> > > "Alter Table .. " that rewrites with volatile expressions can also
> > > be handled without any syntax change, by enabling the table rewrite
> > > replication and converting the rewrite inserts to updates. ZJ's patch
> introduced this solution.
> >
> > 3) About the handling of ALTER TABLE rewrite.
> >
> > The approach I proposed before is based on the event trigger +
> > deparser approach. We were able to improve that approach as we don't
> > need to replicate the rewrite in many cases. For example: we don't
> > need to replicate rewrite dml if there is no volatile/mutable
> > function. We should check and filter these case at publisher (e.g. via
> deparser) instead of checking that at subscriber.
>
> Surely we can make the check about volatile/mutable functions on the
> publisher side as well. It doesn't have to be done via the deparser.
>
> > Besides, as discussed, we need to give warning or error for the cases
> > when DDL contains volatile function which would be executed[2]. We
> > should check this at publisher as well(via deparser).
>
> Again, I think the check doesn't have to be done via the deparser.

Personally, I think it's not great to add lots of logical replication
related code(check for rewrite DDL, check for function volatility) in
utility.c or tablecmds.c which seems a bit ad-hoc.

Best regards,
Hou zj

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-06-23 10:00:27
Message-ID:	CAA4eK1KsMv1JmoyfUBnOZryW4X+1HOQiDwkP5R6BJZnFKrUErg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Wed, Jun 22, 2022 at 11:09 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> I've attached a WIP patch for adding regression tests for DDL deparse.
> The patch can be applied on
> v9-0001-Functions-to-deparse-DDL-commands.patch Hou recently
> submitted[1]. The basic idea is to define the event trigger to deparse
> DDLs, run the regression tests, load the deparsed DDLs to another
> database cluster, dump both databases, and compare the dumps.
>

Thanks for working on this. It is a good start. I think this will be
helpful to see the missing DDL support. Do we want to run this as part
of every regression run? Won't it increase the regression time as this
seems to run internally the regression tests twice?

Do we need a different trigger to capture drop cases as there are
separate deparsing routines for them, for example
deparse_drop_table()?

> [2] deparsing "ALTER INDEX tbl_idx ALTER COLUMN 2 SET STATISTICS
> 1000;" causes an assertion failure.
>

Sorry, it is not clear to me whether you are talking about some
pre-existing bug or a bug in the proposed patch?

--
With Regards,
Amit Kapila.

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-06-23 12:43:56
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On 2022-Jun-15, houzj(dot)fnst(at)fujitsu(dot)com wrote:

> On Wednesday, June 15, 2022 8:14 AM Zheng Li <zhengli10(at)gmail(dot)com> wrote:

> > How does the deparser deparses CREATE FUNCTION STATEMENT? Will it
> > schema qualify
> > objects inside the function definition?
>
> The current deparser doesn't schema qualify objects inside the function
> source as we won't know the schema of inner objects until the function is
> executed. The deparser will only schema qualify the objects around
> function declaration Like:
>
> CREATE FUNCTION [public].test_func(i [pg_catalog].int4 ) RETURNS [pg_catalog].int4 LANGUAGE plpgsql

Right, this is by design. There is no way to deparse a function body --
as far as the backend is concerned, the body is just an opaque string.
That string is to be interpreted by the language handler only.

I don't know if it's possible to do different for non-core PLs, but I do
not think we have to worry about them in the Postgres implementation.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"La victoria es para quien se atreve a estar solo"

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-06-24 02:39:28
Message-ID:	CAD21AoBghkaJ4AzN0XE-4J6OCWH5JNQWops6eiwT0_FeGt4g+A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Thu, Jun 23, 2022 at 7:00 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Jun 22, 2022 at 11:09 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > I've attached a WIP patch for adding regression tests for DDL deparse.
> > The patch can be applied on
> > v9-0001-Functions-to-deparse-DDL-commands.patch Hou recently
> > submitted[1]. The basic idea is to define the event trigger to deparse
> > DDLs, run the regression tests, load the deparsed DDLs to another
> > database cluster, dump both databases, and compare the dumps.
> >
>
> Thanks for working on this. It is a good start. I think this will be
> helpful to see the missing DDL support. Do we want to run this as part
> of every regression run? Won't it increase the regression time as this
> seems to run internally the regression tests twice?

Yes, It will increase the regression test time but we already do a
similar thing in 002_pg_upgrade.pl and 027_stream_regress.pl and it
seems to be worth adding to me.

>
> Do we need a different trigger to capture drop cases as there are
> separate deparsing routines for them, for example
> deparse_drop_table()?

Right, we need to capture drop cases by another trigger.

>
> > [2] deparsing "ALTER INDEX tbl_idx ALTER COLUMN 2 SET STATISTICS
> > 1000;" causes an assertion failure.
> >
>
> Sorry, it is not clear to me whether you are talking about some
> pre-existing bug or a bug in the proposed patch?

I meant there is a bug in the v9 DDL deparse patch.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-06-24 10:19:24
Message-ID:	CAA4eK1KcWnj3-xD+AbV8NZ0uqkE-MA2mb0G=-T3xj_3SRY40tg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, Jun 24, 2022 at 8:10 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Thu, Jun 23, 2022 at 7:00 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Wed, Jun 22, 2022 at 11:09 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > I've attached a WIP patch for adding regression tests for DDL deparse.
> > > The patch can be applied on
> > > v9-0001-Functions-to-deparse-DDL-commands.patch Hou recently
> > > submitted[1]. The basic idea is to define the event trigger to deparse
> > > DDLs, run the regression tests, load the deparsed DDLs to another
> > > database cluster, dump both databases, and compare the dumps.
> > >
> >
> > Thanks for working on this. It is a good start. I think this will be
> > helpful to see the missing DDL support. Do we want to run this as part
> > of every regression run? Won't it increase the regression time as this
> > seems to run internally the regression tests twice?
>
> Yes, It will increase the regression test time but we already do a
> similar thing in 002_pg_upgrade.pl and 027_stream_regress.pl and it
> seems to be worth adding to me.
>

Fair enough. I think here we need to run it twice once before deparse
and once after deparsing whereas those tests seem to be running it one
time. That might not matter much but we can check the timing
difference. I agree that we anyway need something like this to verify
the deparsing code.

> >
> > Do we need a different trigger to capture drop cases as there are
> > separate deparsing routines for them, for example
> > deparse_drop_table()?
>
> Right, we need to capture drop cases by another trigger.
>

And probably something for alter subcommands as well.

--
With Regards,
Amit Kapila.

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-06-26 17:50:27
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On 2022-Jun-22, vignesh C wrote:

> 1) Creation of temporary table fails infinitely in the subscriber.
> CREATE TEMPORARY TABLE temp1 (a int primary key);
>
> The above statement is converted to the below format:
> CREATE TEMPORARY TABLE pg_temp.temp1 (a pg_catalog.int4 ,
> CONSTRAINT temp1_pkey PRIMARY KEY (a));
> While handling the creation of temporary table in the worker, the
> worker fails continuously with the following error:
> 2022-06-22 14:24:01.317 IST [240872] ERROR: schema "pg_temp" does not exist

Perhaps one possible fix is to change the JSON format string used in
deparse_CreateStmt. Currently, the following is used:

+ if (node->ofTypename)
+ fmtstr = "CREATE %{persistence}s TABLE %{if_not_exists}s %{identity}D "
+ "OF %{of_type}T %{table_elements}s "
+ "%{with_clause}s %{on_commit}s %{tablespace}s";
+ else
+ fmtstr = "CREATE %{persistence}s TABLE %{if_not_exists}s %{identity}D "
+ "(%{table_elements:, }s) %{inherits}s "
+ "%{with_clause}s %{on_commit}s %{tablespace}s";
+
+ createStmt =
+ new_objtree_VA(fmtstr, 1,
+ "persistence", ObjTypeString,
+ get_persistence_str(relation->rd_rel->relpersistence));

(Note that the word for the "persistence" element here comes straight
from relation->rd_rel->relpersistence.) Maybe it would be more
appropriate to set the schema to empty when the table is temp, since the
temporary-ness is in the %{persistence} element, and thus there is no
need to schema-qualify the table name.

However, that would still replicate a command that involves a temporary
table, which perhaps should not be considered fit for replication. So
another school of thought is that if the %{persistence} is set to
TEMPORARY, then it would be better to skip replicating the command
altogether. I'm not sure how to plug that in the replication layer,
however.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"At least to kernel hackers, who really are human, despite occasional
rumors to the contrary" (LWN.net)

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-06-28 03:26:39
Message-ID:	CAA4eK1LD0x7gc7857AmHq82tYf-CQhWv82TT_4R-ZhZeiTd0RQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Sun, Jun 26, 2022 at 11:47 PM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>
> On 2022-Jun-22, vignesh C wrote:
>
> > 1) Creation of temporary table fails infinitely in the subscriber.
> > CREATE TEMPORARY TABLE temp1 (a int primary key);
> >
> > The above statement is converted to the below format:
> > CREATE TEMPORARY TABLE pg_temp.temp1 (a pg_catalog.int4 ,
> > CONSTRAINT temp1_pkey PRIMARY KEY (a));
> > While handling the creation of temporary table in the worker, the
> > worker fails continuously with the following error:
> > 2022-06-22 14:24:01.317 IST [240872] ERROR: schema "pg_temp" does not exist
>
> Perhaps one possible fix is to change the JSON format string used in
> deparse_CreateStmt. Currently, the following is used:
>
> + if (node->ofTypename)
> + fmtstr = "CREATE %{persistence}s TABLE %{if_not_exists}s %{identity}D "
> + "OF %{of_type}T %{table_elements}s "
> + "%{with_clause}s %{on_commit}s %{tablespace}s";
> + else
> + fmtstr = "CREATE %{persistence}s TABLE %{if_not_exists}s %{identity}D "
> + "(%{table_elements:, }s) %{inherits}s "
> + "%{with_clause}s %{on_commit}s %{tablespace}s";
> +
> + createStmt =
> + new_objtree_VA(fmtstr, 1,
> + "persistence", ObjTypeString,
> + get_persistence_str(relation->rd_rel->relpersistence));
>
> (Note that the word for the "persistence" element here comes straight
> from relation->rd_rel->relpersistence.) Maybe it would be more
> appropriate to set the schema to empty when the table is temp, since the
> temporary-ness is in the %{persistence} element, and thus there is no
> need to schema-qualify the table name.
>
>
> However, that would still replicate a command that involves a temporary
> table, which perhaps should not be considered fit for replication. So
> another school of thought is that if the %{persistence} is set to
> TEMPORARY, then it would be better to skip replicating the command
> altogether.
>

+1. I think it doesn't make sense to replicate temporary tables.
Similarly, we don't need to replicate the unlogged tables.

> I'm not sure how to plug that in the replication layer,
> however.
>

I see two possibilities (a) We can check the persistence and skip
logging it in the event trigger where the patch deparses the DDL and
WAL log it, or (b) We can add a similar check in pgoutput.c where we
send the DDL to downstream.

I feel (a) is better unless it is difficult to detect at that stage as
that saves additional WAL.

--
With Regards,
Amit Kapila.

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-06-28 12:13:44
Message-ID:	CAA4eK1K88SMoBq=DRA4XU-F3FG6qyzCjGMMKsPpcRBPRcrELrw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Tue, Jun 21, 2022 at 5:49 PM houzj(dot)fnst(at)fujitsu(dot)com
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Monday, June 20, 2022 11:32 AM houzj(dot)fnst(at)fujitsu(dot)com <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
>
> Attach the new version patch set which added support for CREATE/DROP/ATER
> Sequence and CREATE/DROP Schema ddl commands which are provided by Ajin
> Cherian off list.
>

Few questions/comments on v9-0001-Functions-to-deparse-DDL-commands
===========================================================
1.
+/*
+ * Similar to format_type_internal, except we return each bit of information
+ * separately:
...
...
+static void
+format_type_detailed(Oid type_oid, int32 typemod,
+ Oid *nspid, char **typname, char **typemodstr,
+ bool *typarray)

The function mentioned in the comments seems to be changed to
format_type_extended in commit a26116c6. If so, change it accordingly.

2. It is not clear to me why format_type_detailed needs to use
'peculiar_typmod' label and then goto statement? In
format_type_extended, we have similar code but without using the goto
statement, so can't we use a similar way here as well?

3.
format_type_detailed()
{
...
+ typeform->typstorage != 'p')

It is better to replace the constant 'p' with TYPSTORAGE_PLAIN.

4. It seems to me that the handling of some of the built-in types is
different between format_type_detailed and format_type_extended. Can
we add some comments to explain the same?

5.
+static ObjTree *
+deparse_CreateStmt(Oid objectId, Node *parsetree)
{
...
+ tmp = new_objtree_VA("TABLESPACE %{tablespace}I", 0);
+ if (node->tablespacename)
+ append_string_object(tmp, "tablespace", node->tablespacename);
+ else
+ {
+ append_null_object(tmp, "tablespace");
+ append_bool_object(tmp, "present", false);
+ }
+ append_object_object(createStmt, "tablespace", tmp);
...
}

Why do we need to append the objects (tablespace, with clause, etc.)
when they are not present in the actual CREATE TABLE statement? The
reason to ask this is that this makes the string that we want to send
downstream much longer than the actual statement given by the user on
the publisher.

--
With Regards,
Amit Kapila.

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-06-28 16:47:31
Message-ID:	CALDaNm3rEA_zmnDMOCT7NqK4aAffhAgooLf8rXjUN=YwA8ASFw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Few comments:
1) I found a null pointer access while trying to use the ddl feature:
#0 0x000055e9deb2904b in EventTriggerTableInitWrite
(real_create=0x55e9e0eb0288, address=...) at event_trigger.c:989
#1 0x000055e9deb15745 in create_ctas_internal
(attrList=0x55e9e0eb0140, into=0x55e9e0e86710) at createas.c:154
#2 0x000055e9deb16181 in intorel_startup (self=0x55e9e0e86d00,
operation=1, typeinfo=0x55e9e0ea99d0) at createas.c:526
#3 0x000055e9debdcfdc in standard_ExecutorRun
(queryDesc=0x55e9e0e8f240, direction=ForwardScanDirection, count=0,
execute_once=true) at execMain.c:352
#4 0x000055e9debdcf0b in ExecutorRun (queryDesc=0x55e9e0e8f240,
direction=ForwardScanDirection, count=0, execute_once=true) at
execMain.c:307
#5 0x000055e9deb15cd2 in ExecCreateTableAs (pstate=0x55e9e0e86830,
stmt=0x55e9e0e717e8, params=0x0, queryEnv=0x0, qc=0x7fff45517190) at
createas.c:346
#6 0x000055e9dee3202a in ProcessUtilitySlow (pstate=0x55e9e0e86830,
pstmt=0x55e9e0e70b18,
queryString=0x55e9e0e483e0 "CREATE TEMP TABLE
pg_temp_5.pg_temp_24961_2 AS SELECT mv.ctid AS tid,
newdata.*::pg_temp_5.pg_temp_24961 AS newdata FROM public.sro_index_mv
mv FULL JOIN pg_temp_5.pg_temp_24961 newdata ON (newdata.c "...,
context=PROCESS_UTILITY_QUERY, params=0x0, queryEnv=0x0,
dest=0x55e9df2e7640 <spi_printtupDR>, qc=0x7fff45517190) at
utility.c:1669
#7 0x000055e9dee30b5d in standard_ProcessUtility (pstmt=0x55e9e0e70b18,
queryString=0x55e9e0e483e0 "CREATE TEMP TABLE
pg_temp_5.pg_temp_24961_2 AS SELECT mv.ctid AS tid,
newdata.*::pg_temp_5.pg_temp_24961 AS newdata FROM public.sro_index_mv
mv FULL JOIN pg_temp_5.pg_temp_24961 newdata ON (newdata.c "...,
readOnlyTree=true, context=PROCESS_UTILITY_QUERY, params=0x0,
queryEnv=0x0, dest=0x55e9df2e7640 <spi_printtupDR>, qc=0x7fff45517190)
at utility.c:1074
#8 0x000055e9dee2fac7 in ProcessUtility (pstmt=0x55e9e0e19538,
queryString=0x55e9e0e483e0 "CREATE TEMP TABLE
pg_temp_5.pg_temp_24961_2 AS SELECT mv.ctid AS tid,
newdata.*::pg_temp_5.pg_temp_24961 AS newdata FROM public.sro_index_mv
mv FULL JOIN pg_temp_5.pg_temp_24961 newdata ON (newdata.c "...,
readOnlyTree=true, context=PROCESS_UTILITY_QUERY, params=0x0,
queryEnv=0x0, dest=0x55e9df2e7640 <spi_printtupDR>, qc=0x7fff45517190)
at utility.c:530
#9 0x000055e9dec3c5f1 in _SPI_execute_plan (plan=0x7fff45517230,
options=0x7fff45517200, snapshot=0x0, crosscheck_snapshot=0x0,
fire_triggers=true) at spi.c:2693
#10 0x000055e9dec38ea8 in SPI_execute (
src=0x55e9e0e483e0 "CREATE TEMP TABLE pg_temp_5.pg_temp_24961_2 AS
SELECT mv.ctid AS tid, newdata.*::pg_temp_5.pg_temp_24961 AS newdata
FROM public.sro_index_mv mv FULL JOIN pg_temp_5.pg_temp_24961 newdata
ON (newdata.c "..., read_only=false, tcount=0) at spi.c:618
#11 0x000055e9dec38efd in SPI_exec (
src=0x55e9e0e483e0 "CREATE TEMP TABLE pg_temp_5.pg_temp_24961_2 AS
SELECT mv.ctid AS tid, newdata.*::pg_temp_5.pg_temp_24961 AS newdata
FROM public.sro_index_mv mv FULL JOIN pg_temp_5.pg_temp_24961 newdata
ON (newdata.c "..., tcount=0) at spi.c:630
#12 0x000055e9deb5360a in refresh_by_match_merge (matviewOid=24961,
tempOid=24966, relowner=24909, save_sec_context=0) at matview.c:795
#13 0x000055e9deb528ca in ExecRefreshMatView (stmt=0x55e9e0d3e670,
queryString=0x55e9e0d3dc18 "REFRESH MATERIALIZED VIEW CONCURRENTLY
sro_index_mv;", params=0x0, qc=0x7fff45517d40) at matview.c:317

currentCommand seems to be null here:
+ /*
+ * Also do nothing if our state isn't set up, which it won't be if there
+ * weren't any relevant event triggers at the start of the current DDL
+ * command. This test might therefore seem optional, but it's
+ * *necessary*, because EventTriggerCommonSetup might find triggers that
+ * didn't exist at the time the command started.
+ */
+ if (!currentEventTriggerState)
+ return;
+
+ command = currentEventTriggerState->currentCommand;
+
+ runlist = EventTriggerCommonSetup(command->parsetree,
+
EVT_TableInitWrite,
+
"table_init_write",
+
&trigdata);

2) Missing copyright information:
diff --git a/src/include/tcop/ddl_deparse.h b/src/include/tcop/ddl_deparse.h
new file mode 100644
index 0000000..4f3c55f
--- /dev/null
+++ b/src/include/tcop/ddl_deparse.h
@@ -0,0 +1,12 @@
+#ifndef DDL_DEPARSE_H
+#define DDL_DEPARSE_H
+
+#include "tcop/deparse_utility.h"

3) This line removal is not required:
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 87aa571..8aa636c 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11884,5 +11884,10 @@
proname => 'brin_minmax_multi_summary_send', provolatile => 's',
prorettype => 'bytea', proargtypes => 'pg_brin_minmax_multi_summary',
prosrc => 'brin_minmax_multi_summary_send' },
-
+{ oid => '4642', descr => 'ddl deparse',
+ proname => 'ddl_deparse_to_json', prorettype => 'text',

4) We could add few comments:
+typedef enum
+{
+ SpecTypename,
+ SpecOperatorname,
+ SpecDottedName,
+ SpecString,
+ SpecNumber,
+ SpecStringLiteral,
+ SpecIdentifier,
+ SpecRole
+} convSpecifier;
+
+typedef enum
+{
+ tv_absent,
+ tv_true,
+ tv_false
+} trivalue;

5) Missing function header:
+static void fmtstr_error_callback(void *arg);
+char *ddl_deparse_json_to_string(char *jsonb);
+
+static trivalue
+find_bool_in_jsonbcontainer(JsonbContainer *container, char *keyname)
+{
+ JsonbValue key;
+ JsonbValue *value;
+ trivalue result;

6) This can be removed:
+/*
+removed feature
+ case AT_AddOidsRecurse:
+ case AT_AddOids:
+ tmp = new_objtree_VA("SET WITH OIDS", 1,
+
"type", ObjTypeString, "set with oids");
+ subcmds = lappend(subcmds,
new_object_object(tmp));
+ break;
+*/

Regards,
Vignesh

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-06-29 03:06:54
Message-ID:	CAA4eK1+3ac2qXZZYfdiobuOF17e60v-fiFMG7HfJx93WbEkFhQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Tue, Jun 28, 2022 at 5:43 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> 5.
> +static ObjTree *
> +deparse_CreateStmt(Oid objectId, Node *parsetree)
> {
> ...
> + tmp = new_objtree_VA("TABLESPACE %{tablespace}I", 0);
> + if (node->tablespacename)
> + append_string_object(tmp, "tablespace", node->tablespacename);
> + else
> + {
> + append_null_object(tmp, "tablespace");
> + append_bool_object(tmp, "present", false);
> + }
> + append_object_object(createStmt, "tablespace", tmp);
> ...
> }
>
> Why do we need to append the objects (tablespace, with clause, etc.)
> when they are not present in the actual CREATE TABLE statement? The
> reason to ask this is that this makes the string that we want to send
> downstream much longer than the actual statement given by the user on
> the publisher.
>

After thinking some more on this, it seems the user may want to
optionally change some of these attributes, for example, on the
subscriber, it may want to associate the table with a different
tablespace. I think to address that we can append these additional
attributes optionally, say via an additional parameter
(append_all_options/append_all_attributes or something like that) in
exposed APIs like deparse_utility_command().

--
With Regards,
Amit Kapila.

From:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>
Subject:	RE: Support logical replication of DDLs
Date:	2022-06-29 09:47:09
Message-ID:	OS0PR01MB571684BCD808F0480066F38B94BB9@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Tuesday, June 28, 2022 11:27 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> On Sun, Jun 26, 2022 at 11:47 PM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
> wrote:
> >
> > On 2022-Jun-22, vignesh C wrote:
> >
> > > 1) Creation of temporary table fails infinitely in the subscriber.
> > > CREATE TEMPORARY TABLE temp1 (a int primary key);
> > >
> > > The above statement is converted to the below format:
> > > CREATE TEMPORARY TABLE pg_temp.temp1 (a pg_catalog.int4 ,
> > > CONSTRAINT temp1_pkey PRIMARY KEY (a)); While handling the creation
> > > of temporary table in the worker, the worker fails continuously with
> > > the following error:
> > > 2022-06-22 14:24:01.317 IST [240872] ERROR: schema "pg_temp" does
> > > not exist
> >
> > Perhaps one possible fix is to change the JSON format string used in
> > deparse_CreateStmt. Currently, the following is used:
> >
> > + if (node->ofTypename)
> > + fmtstr = "CREATE %{persistence}s
> TABLE %{if_not_exists}s %{identity}D "
> > + "OF %{of_type}T %{table_elements}s "
> > + "%{with_clause}s %{on_commit}s %{tablespace}s";
> > + else
> > + fmtstr = "CREATE %{persistence}s
> TABLE %{if_not_exists}s %{identity}D "
> > + "(%{table_elements:, }s) %{inherits}s "
> > + "%{with_clause}s %{on_commit}s
> > + %{tablespace}s";
> > +
> > + createStmt =
> > + new_objtree_VA(fmtstr, 1,
> > + "persistence", ObjTypeString,
> > +
> > + get_persistence_str(relation->rd_rel->relpersistence));
> >
> > (Note that the word for the "persistence" element here comes straight
> > from relation->rd_rel->relpersistence.) Maybe it would be more
> > appropriate to set the schema to empty when the table is temp, since
> > the temporary-ness is in the %{persistence} element, and thus there is
> > no need to schema-qualify the table name.
> >
> >
> > However, that would still replicate a command that involves a
> > temporary table, which perhaps should not be considered fit for
> > replication. So another school of thought is that if the
> > %{persistence} is set to TEMPORARY, then it would be better to skip
> > replicating the command altogether.
> >
>
> +1. I think it doesn't make sense to replicate temporary tables.
> Similarly, we don't need to replicate the unlogged tables.

I agree that we don’t need to replicate temporary tables.

For unlogged table, one thing I noticed is that we always replicate the
DDL action on unlogged table in streaming replication. So, to be
consistent, maybe we need to generate WAL for DDL on unlogged table as
well ?

Best regards,
Hou zj

From:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>
Subject:	RE: Support logical replication of DDLs
Date:	2022-06-29 09:54:40
Message-ID:	OS0PR01MB571676298B814F7F5690D92994BB9@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Wednesday, June 29, 2022 11:07 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Jun 28, 2022 at 5:43 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> >
> > 5.
> > +static ObjTree *
> > +deparse_CreateStmt(Oid objectId, Node *parsetree)
> > {
> > ...
> > + tmp = new_objtree_VA("TABLESPACE %{tablespace}I", 0); if
> > + (node->tablespacename) append_string_object(tmp, "tablespace",
> > + node->tablespacename); else { append_null_object(tmp, "tablespace");
> > + append_bool_object(tmp, "present", false); }
> > + append_object_object(createStmt, "tablespace", tmp);
> > ...
> > }
> >
> > Why do we need to append the objects (tablespace, with clause, etc.)
> > when they are not present in the actual CREATE TABLE statement? The
> > reason to ask this is that this makes the string that we want to send
> > downstream much longer than the actual statement given by the user on
> > the publisher.
> >
>
> After thinking some more on this, it seems the user may want to optionally
> change some of these attributes, for example, on the subscriber, it may want to
> associate the table with a different tablespace. I think to address that we can
> append these additional attributes optionally, say via an additional parameter
> (append_all_options/append_all_attributes or something like that) in exposed
> APIs like deparse_utility_command().

I agree and will research this part.

And here is the new version patch set.
Most of changes are in the deparser which include:

support CREATE PARTITIONED TABLE
support ALTER ATTACH/DETACH PARTITION
support CREATE/ALTER TABLE with ACCESS METHOD
support CREATE TABLE OF
support CREATE/ALTER TABLE with GENERATED COLUMN
support CREATE/ALTER TABLE with DENTITY COLUMN
support CREATE/ALTER TABLE with COMPRESSION METHOD
support ALTER COLUMN numofcol SET STATISTICS (mentioned by sawada-san [1])
support ALTER SCHEMA
support CRAETE/DROP INDEX

Note that, for ATTACH/DETACH PARTITION, I haven't added extra logic on
subscriber to handle the case where the table on publisher is a PARTITIONED
TABLE while the target table on subscriber side is NORMAL table. We will
research this more and improve this later.

Besides, the new version event trigger won't WAL log the DDL whose target
table is a temporary table so that the problem reported by Vignesh[2] is
fixed.

About the recent comment from Amit[3] and Vignesh[4], I will investigate the
comments and address them in next version.

[1] https://www.postgresql.org/message-id/CAD21AoBVCoPPRKvU_5-%3DwEXsa92GsNJFJOcYyXzvoSEJCx5dKw%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CALDaNm33W35pcBE3zOpJhwnYBdBoZDpKxssemAN21NwVhJuong%40mail.gmail.com
[3] https://www.postgresql.org/message-id/CAA4eK1K88SMoBq%3DDRA4XU-F3FG6qyzCjGMMKsPpcRBPRcrELrw%40mail.gmail.com
[4] https://www.postgresql.org/message-id/CALDaNm3rEA_zmnDMOCT7NqK4aAffhAgooLf8rXjUN%3DYwA8ASFw%40mail.gmail.com

Best regards,
Hou zj

Attachment	Content-Type	Size
v10-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	application/octet-stream	14.6 KB
v10-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	132.6 KB
v10-0002-Support-DDL-replication.patch	application/octet-stream	125.4 KB
v10-0004-Test-cases-for-DDL-replication.patch	application/octet-stream	22.5 KB

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-06-29 10:48:12
Message-ID:	CAA4eK1L_Duk83otuAwtvGo21QWpKk4ptBib60H8wkFBxcPfcBw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Wed, Jun 29, 2022 at 3:24 PM houzj(dot)fnst(at)fujitsu(dot)com
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Wednesday, June 29, 2022 11:07 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Tue, Jun 28, 2022 at 5:43 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> > wrote:
> > >
> > > 5.
> > > +static ObjTree *
> > > +deparse_CreateStmt(Oid objectId, Node *parsetree)
> > > {
> > > ...
> > > + tmp = new_objtree_VA("TABLESPACE %{tablespace}I", 0); if
> > > + (node->tablespacename) append_string_object(tmp, "tablespace",
> > > + node->tablespacename); else { append_null_object(tmp, "tablespace");
> > > + append_bool_object(tmp, "present", false); }
> > > + append_object_object(createStmt, "tablespace", tmp);
> > > ...
> > > }
> > >
> > > Why do we need to append the objects (tablespace, with clause, etc.)
> > > when they are not present in the actual CREATE TABLE statement? The
> > > reason to ask this is that this makes the string that we want to send
> > > downstream much longer than the actual statement given by the user on
> > > the publisher.
> > >
> >
> > After thinking some more on this, it seems the user may want to optionally
> > change some of these attributes, for example, on the subscriber, it may want to
> > associate the table with a different tablespace. I think to address that we can
> > append these additional attributes optionally, say via an additional parameter
> > (append_all_options/append_all_attributes or something like that) in exposed
> > APIs like deparse_utility_command().
>
> I agree and will research this part.
>

Okay, note that similar handling would be required at other places
like deparse_ColumnDef. Few other comments on
v9-0001-Functions-to-deparse-DDL-commands.

1.
+static ObjElem *new_bool_object(bool value);
+static ObjElem *new_string_object(char *value);
+static ObjElem *new_object_object(ObjTree *value);
+static ObjElem *new_array_object(List *array);
+static ObjElem *new_integer_object(int64 value);
+static ObjElem *new_float_object(float8 value);

Here, new_object_object() seems to be declared out-of-order (not in
sync with its order of definition). Similarly, see all other append_*
functions.

2. The function printTypmod in ddl_deparse.c appears to be the same as
the function with the same name in format_type.c. If so, isn't it
better to have a single copy of that function?

3. As I pointed out yesterday, there is some similarity between
format_type_extended and format_type_detailed. Can we try to extract
common functionality? If this is feasible, it is better to do this as
a separate patch. Also, this can obviate the need to expose
printTypmod from format_type.c. I am not really sure if this will be
better than the current one or not but it seems worth trying.

4.
new_objtree_VA()
{
...
switch (type)
+ {
+ case ObjTypeBool:
+ elem = new_bool_object(va_arg(args, int));
+ break;
+ case ObjTypeString:
+ elem = new_string_object(va_arg(args, char *));
+ break;
+ case ObjTypeObject:
+ elem = new_object_object(va_arg(args, ObjTree *));
+ break;
+ case ObjTypeArray:
+ elem = new_array_object(va_arg(args, List *));
+ break;
+ case ObjTypeInteger:
+ elem = new_integer_object(va_arg(args, int64));
+ break;
+ case ObjTypeFloat:
+ elem = new_float_object(va_arg(args, double));
+ break;
+ case ObjTypeNull:
+ /* Null params don't have a value (obviously) */
+ elem = new_null_object();
...

I feel here ObjType's should be handled in the same order as they are
defined in ObjType.

5. There appears to be common code among node_*_object() functions.
Can we just have one function instead new_object(ObjType, void *val)?
In the function based on type, we can typecast the value. Is there a
reason why that won't be better than current one?

6.
deparse_ColumnDef()
{
...
/* Composite types use a slightly simpler format string */
+ if (composite)
+ column = new_objtree_VA("%{name}I %{coltype}T %{collation}s",
+ 3,
+ "type", ObjTypeString, "column",
+ "name", ObjTypeString, coldef->colname,
+ "coltype", ObjTypeObject,
+ new_objtree_for_type(typid, typmod));
+ else
+ column = new_objtree_VA("%{name}I %{coltype}T %{default}s
%{not_null}s %{collation}s",
+ 3,
+ "type", ObjTypeString, "column",
+ "name", ObjTypeString, coldef->colname,
+ "coltype", ObjTypeObject,
+ new_objtree_for_type(typid, typmod));
...
}

To avoid using the same arguments, we can define fmtstr for composite
and non-composite types as the patch is doing in deparse_CreateStmt().

7. It is not clear from comments or otherwise what should be
considered for default format string in functions like
deparse_ColumnDef() or deparse_CreateStmt().

8.
+ * FIXME --- actually, what about default values?
+ */
+static ObjTree *
+deparse_ColumnDef_typed(Relation relation, List *dpcontext, ColumnDef *coldef)

I think we need to handle default values for this FIXME.

--
With Regards,
Amit Kapila.

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-06-30 06:14:22
Message-ID:	CAA4eK1KmnEw6Kr1Qf=t_p3qxO2UrWU-FGePJGOqR390PD5ie7g@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Wed, Jun 29, 2022 at 3:17 PM houzj(dot)fnst(at)fujitsu(dot)com
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Tuesday, June 28, 2022 11:27 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> > On Sun, Jun 26, 2022 at 11:47 PM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
> > wrote:
> > >
> > > However, that would still replicate a command that involves a
> > > temporary table, which perhaps should not be considered fit for
> > > replication. So another school of thought is that if the
> > > %{persistence} is set to TEMPORARY, then it would be better to skip
> > > replicating the command altogether.
> > >
> >
> > +1. I think it doesn't make sense to replicate temporary tables.
> > Similarly, we don't need to replicate the unlogged tables.
>
> I agree that we don’t need to replicate temporary tables.
>
> For unlogged table, one thing I noticed is that we always replicate the
> DDL action on unlogged table in streaming replication. So, to be
> consistent, maybe we need to generate WAL for DDL on unlogged table as
> well ?
>

We don't seem to WAL log the main fork, so that shouldn't be created
in physical replication whereas in your case it will create the main
fork unless you are doing some special handling for subscribers/apply
worker. We are also not allowed to read the unlogged tables on standby
whereas after logical replication users will be allowed to operate on
them. I think because we need to insert catalog entries for 'create
unlogged table' which can't be selectively logged, it gets replicated
to a physical stand by but I don't see why we need to behave similarly
for logical replication. Can you think of some reason why we need to
be consistent here or in other words why we should replicate DDL for
unlogged tables in logical replication? I am not against it but can't
see the reason for doing it based on the theory that when we are not
going to replicate the data of such tables why should we replicate its
schema.

--
With Regards,
Amit Kapila.

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-06-30 08:16:24
Message-ID:	CAA4eK1LLcXcTaRmO_jFHFJAsw2V+Z_zYhm9c8KQKaPr_c0X5Sg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Thu, Jun 30, 2022 at 11:44 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Jun 29, 2022 at 3:17 PM houzj(dot)fnst(at)fujitsu(dot)com
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > On Tuesday, June 28, 2022 11:27 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> > >
> > > +1. I think it doesn't make sense to replicate temporary tables.
> > > Similarly, we don't need to replicate the unlogged tables.
> >
> > I agree that we don’t need to replicate temporary tables.
> >
> > For unlogged table, one thing I noticed is that we always replicate the
> > DDL action on unlogged table in streaming replication. So, to be
> > consistent, maybe we need to generate WAL for DDL on unlogged table as
> > well ?
> >
>
> We don't seem to WAL log the main fork, so that shouldn't be created
> in physical replication whereas in your case it will create the main
> fork unless you are doing some special handling for subscribers/apply
> worker. We are also not allowed to read the unlogged tables on standby
> whereas after logical replication users will be allowed to operate on
> them. I think because we need to insert catalog entries for 'create
> unlogged table' which can't be selectively logged, it gets replicated
> to a physical stand by but I don't see why we need to behave similarly
> for logical replication. Can you think of some reason why we need to
> be consistent here or in other words why we should replicate DDL for
> unlogged tables in logical replication?
>

If we don't replicate the unlogged table DDL 'Create Unlogged Table
...', then later if the user changes the table to logged 'Alter Table
... Set Logged' then we need to do some special hacking to create the
table. So, I think we should replicate 'Create Unlogged Table ...'
DDL.

--
With Regards,
Amit Kapila.

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-07-01 16:52:04
Message-ID:	CALDaNm2nFPMxUo=0zRUUA-v3_eRwRY+ii5nnG_PU+6jT7ta9dA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Wed, Jun 29, 2022 at 3:25 PM houzj(dot)fnst(at)fujitsu(dot)com
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Wednesday, June 29, 2022 11:07 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Tue, Jun 28, 2022 at 5:43 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> > wrote:
> > >
> > > 5.
> > > +static ObjTree *
> > > +deparse_CreateStmt(Oid objectId, Node *parsetree)
> > > {
> > > ...
> > > + tmp = new_objtree_VA("TABLESPACE %{tablespace}I", 0); if
> > > + (node->tablespacename) append_string_object(tmp, "tablespace",
> > > + node->tablespacename); else { append_null_object(tmp, "tablespace");
> > > + append_bool_object(tmp, "present", false); }
> > > + append_object_object(createStmt, "tablespace", tmp);
> > > ...
> > > }
> > >
> > > Why do we need to append the objects (tablespace, with clause, etc.)
> > > when they are not present in the actual CREATE TABLE statement? The
> > > reason to ask this is that this makes the string that we want to send
> > > downstream much longer than the actual statement given by the user on
> > > the publisher.
> > >
> >
> > After thinking some more on this, it seems the user may want to optionally
> > change some of these attributes, for example, on the subscriber, it may want to
> > associate the table with a different tablespace. I think to address that we can
> > append these additional attributes optionally, say via an additional parameter
> > (append_all_options/append_all_attributes or something like that) in exposed
> > APIs like deparse_utility_command().
>
> I agree and will research this part.
>
> And here is the new version patch set.
> Most of changes are in the deparser which include:
>
> support CREATE PARTITIONED TABLE
> support ALTER ATTACH/DETACH PARTITION
> support CREATE/ALTER TABLE with ACCESS METHOD
> support CREATE TABLE OF
> support CREATE/ALTER TABLE with GENERATED COLUMN
> support CREATE/ALTER TABLE with DENTITY COLUMN
> support CREATE/ALTER TABLE with COMPRESSION METHOD
> support ALTER COLUMN numofcol SET STATISTICS (mentioned by sawada-san [1])
> support ALTER SCHEMA
> support CRAETE/DROP INDEX
>
> Note that, for ATTACH/DETACH PARTITION, I haven't added extra logic on
> subscriber to handle the case where the table on publisher is a PARTITIONED
> TABLE while the target table on subscriber side is NORMAL table. We will
> research this more and improve this later.
>
> Besides, the new version event trigger won't WAL log the DDL whose target
> table is a temporary table so that the problem reported by Vignesh[2] is
> fixed.
>
> About the recent comment from Amit[3] and Vignesh[4], I will investigate the
> comments and address them in next version.
>
> [1] https://www.postgresql.org/message-id/CAD21AoBVCoPPRKvU_5-%3DwEXsa92GsNJFJOcYyXzvoSEJCx5dKw%40mail.gmail.com
> [2] https://www.postgresql.org/message-id/CALDaNm33W35pcBE3zOpJhwnYBdBoZDpKxssemAN21NwVhJuong%40mail.gmail.com
> [3] https://www.postgresql.org/message-id/CAA4eK1K88SMoBq%3DDRA4XU-F3FG6qyzCjGMMKsPpcRBPRcrELrw%40mail.gmail.com
> [4] https://www.postgresql.org/message-id/CALDaNm3rEA_zmnDMOCT7NqK4aAffhAgooLf8rXjUN%3DYwA8ASFw%40mail.gmail.com

Thanks for the updated patch.
Few comments on 0002 patch:
1) When we create a subscription for a publication with the existing
default PUBLISH parameter having default value as
'insert,update,delete,truncate', we do an initial table sync to get
the initial table data from the publisher to the subscriber. But in
case of a publication created with 'ddl', the subscription expects the
existing initial tables present in the publisher to be created
beforehand in the subscriber. Should this be the default behavior?
Should we do a ddl dump for all the tables and restore the ddl to the
subscription while creating the subscription? Or is this planned as an
option for the later version. If we could do this as part of ddl
logical replication, it will help in reducing the steps further for
logical replication setup. If this will not be supported in this patch
series, probably we could document the behavior and add comments for
this at an appropriate place.

2) When a publication is created with ddl enabled, event triggers
will be created implicitly. Currently these event triggers are also
getting dumped. These should not be dumped. Currently while the
publication is restored, the event trigger will be created and also
the explicit event triggers which were dumped will also get created
resulting in multiple event triggers. The event trigger should not be
dumped.
@@ -4016,6 +4026,15 @@ dumpPublication(Archive *fout, const
PublicationInfo *pubinfo)
first = false;
}

+ if (pubinfo->pubddl)
+ {
+ if (!first)
+ appendPQExpBufferStr(query, ", ");
+
+ appendPQExpBufferStr(query, "ddl");
+ first = false;
+ }
+

3) SYNTAX Support:
Currently creation of "FOR TABLE" publication with ddl is supported.
Should we allow support of ddl for "FOR TABLE" publication. The
creation of the subscription will fail saying the table does not exist
while creating the subscription. Users will have to create all the
tables before creating the subscription. Is this syntax supported for
the use case where the table is altered after the subscription is
created.
ex: create publication pub3 for table t1 with ( PUBLISH= ddl);

4) Few includes can be removed:
4.a) This change is not required, it compiles ok for me without this:
diff --git a/src/backend/access/transam/rmgr.c
b/src/backend/access/transam/rmgr.c
index 8ed6924..312f117 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -27,6 +27,7 @@
#include "fmgr.h"
#include "funcapi.h"
#include "miscadmin.h"
+#include "replication/ddlmessage.h"
#include "replication/decode.h"
#include "replication/message.h"

4.b) This change is not required, it compiles ok for me without this change:
diff --git a/src/backend/utils/adt/ri_triggers.c
b/src/backend/utils/adt/ri_triggers.c
index 51b3fdc..7d60aac 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -33,6 +33,7 @@
#include "catalog/pg_operator.h"
#include "catalog/pg_type.h"
#include "commands/trigger.h"
+#include "commands/event_trigger.h"
#include "executor/executor.h"
#include "executor/spi.h"
#include "lib/ilist.h"
@@ -40,6 +41,7 @@
#include "parser/parse_coerce.h"
#include "parser/parse_relation.h"
#include "storage/bufmgr.h"
+#include "tcop/ddl_deparse.h"

5) The changes are not required for the patch, it can be removed:
5.a) This empty line can be removed
@@ -2153,6 +2329,7 @@ get_rel_sync_entry(PGOutputData *data, Relation relation)
entry->pubactions.pubupdate |=
pub->pubactions.pubupdate;
entry->pubactions.pubdelete |=
pub->pubactions.pubdelete;
entry->pubactions.pubtruncate |=
pub->pubactions.pubtruncate;
+ entry->pubactions.pubddl |=
pub->pubactions.pubddl;

/*
* We want to publish the changes as
the top-most ancestor
@@ -2338,6 +2515,7 @@ rel_sync_cache_publication_cb(Datum arg, int
cacheid, uint32 hashvalue)
{
entry->replicate_valid = false;
}
+
}

5.b) This change is done by mistake, this change can be removed:
@@ -4282,8 +4391,7 @@ ReorderBufferRestoreChange(ReorderBuffer *rb,
ReorderBufferTXN *txn,
/* read prefix */
memcpy(&prefix_size, data, sizeof(Size));
data += sizeof(Size);
- change->data.msg.prefix =
MemoryContextAlloc(rb->context,
-
prefix_size);
+ change->data.msg.prefix =
MemoryContextAlloc(rb->context, prefix_size);

5.c) This change is done by mistake, this change can be removed:
diff --git a/src/backend/replication/logical/proto.c
b/src/backend/replication/logical/proto.c
index ff8513e..eaec031 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -640,8 +640,8 @@ logicalrep_read_truncate(StringInfo in,
*/
void
logicalrep_write_message(StringInfo out, TransactionId xid, XLogRecPtr lsn,
- bool transactional,
const char *prefix, Size sz,
- const char *message)
+ bool transactional,
const char *prefix,
+ Size sz, const char *message)

Regards,
Vignesh

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-07-02 03:21:25
Message-ID:	CAA4eK1+5zJAT_RYOAEOq8M33s196kR5sDyLQLUXd8Rnqr+iB0Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, Jul 1, 2022 at 10:22 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Wed, Jun 29, 2022 at 3:25 PM houzj(dot)fnst(at)fujitsu(dot)com
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
>
> Thanks for the updated patch.
> Few comments on 0002 patch:
> 1) When we create a subscription for a publication with the existing
> default PUBLISH parameter having default value as
> 'insert,update,delete,truncate', we do an initial table sync to get
> the initial table data from the publisher to the subscriber. But in
> case of a publication created with 'ddl', the subscription expects the
> existing initial tables present in the publisher to be created
> beforehand in the subscriber. Should this be the default behavior?
> Should we do a ddl dump for all the tables and restore the ddl to the
> subscription while creating the subscription? Or is this planned as an
> option for the later version.
>

The idea is to develop initial sync (for ddl replication) as a
separate patch. But both need to be integrated at some point.

>
> 3) SYNTAX Support:
> Currently creation of "FOR TABLE" publication with ddl is supported.
> Should we allow support of ddl for "FOR TABLE" publication.
>

The above comment is unclear to me. It seems to me in the first
sentence, you are saying that the "FOR TABLE" syntax is supported and
in the second sentence, you are asking to allow support of it? I think
at this stage, the focus is to build the core part of the feature
(allow ddl replication and deparsing support), and then we can discuss
more on Syntax. Having said that, it will be good if we can support
table-level DDL replication as well in the patch as you seem to be
suggesting.

--
With Regards,
Amit Kapila.

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-07-02 14:24:51
Message-ID:	CALDaNm1HYy9qi-WZV7Wf50s5_JBXwNCLbjTm+4SZkvtwxs7rqw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Sat, Jul 2, 2022 at 8:51 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Jul 1, 2022 at 10:22 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > On Wed, Jun 29, 2022 at 3:25 PM houzj(dot)fnst(at)fujitsu(dot)com
> > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > >
> >
> > Thanks for the updated patch.
> > Few comments on 0002 patch:
> > 1) When we create a subscription for a publication with the existing
> > default PUBLISH parameter having default value as
> > 'insert,update,delete,truncate', we do an initial table sync to get
> > the initial table data from the publisher to the subscriber. But in
> > case of a publication created with 'ddl', the subscription expects the
> > existing initial tables present in the publisher to be created
> > beforehand in the subscriber. Should this be the default behavior?
> > Should we do a ddl dump for all the tables and restore the ddl to the
> > subscription while creating the subscription? Or is this planned as an
> > option for the later version.
> >
>
> The idea is to develop initial sync (for ddl replication) as a
> separate patch. But both need to be integrated at some point.

Yes, that approach makes sense.

> >
> > 3) SYNTAX Support:
> > Currently creation of "FOR TABLE" publication with ddl is supported.
> > Should we allow support of ddl for "FOR TABLE" publication.
> >
>
> The above comment is unclear to me. It seems to me in the first
> sentence, you are saying that the "FOR TABLE" syntax is supported and
> in the second sentence, you are asking to allow support of it? I think
> at this stage, the focus is to build the core part of the feature
> (allow ddl replication and deparsing support), and then we can discuss
> more on Syntax. Having said that, it will be good if we can support
> table-level DDL replication as well in the patch as you seem to be
> suggesting.

I initially thought that supporting "FOR TABLE" publication for ddl
might not be useful as currently the create subscription fails with
table does not exist error. Now that the initial sync for ddl
replication will also be implemented as mentioned in [1], this issue
will be handled. I agree with supporting table-level DDL replication.

[1] - https://www.postgresql.org/message-id/CAA4eK1%2B5zJAT_RYOAEOq8M33s196kR5sDyLQLUXd8Rnqr%2BiB0Q%40mail.gmail.com

Regards,
Vignesh

From:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>
Subject:	RE: Support logical replication of DDLs
Date:	2022-07-05 08:28:05
Message-ID:	OS0PR01MB5716D3B79A607B547F2AB90D94819@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Wednesday, June 29, 2022 6:48 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Jun 29, 2022 at 3:24 PM houzj(dot)fnst(at)fujitsu(dot)com
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > On Wednesday, June 29, 2022 11:07 AM Amit Kapila
> <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Tue, Jun 28, 2022 at 5:43 PM Amit Kapila
> > > <amit(dot)kapila16(at)gmail(dot)com>
> > > wrote:
> > > >
> > > > 5.
> > > > +static ObjTree *
> > > > +deparse_CreateStmt(Oid objectId, Node *parsetree)
> > > > {
> > > > ...
> > > > + tmp = new_objtree_VA("TABLESPACE %{tablespace}I", 0); if
> > > > + (node->tablespacename) append_string_object(tmp, "tablespace",
> > > > + node->tablespacename); else { append_null_object(tmp,
> > > > + node->"tablespace");
> > > > + append_bool_object(tmp, "present", false); }
> > > > + append_object_object(createStmt, "tablespace", tmp);
> > > > ...
> > > > }
> > > >
> > > > Why do we need to append the objects (tablespace, with clause,
> > > > etc.) when they are not present in the actual CREATE TABLE
> > > > statement? The reason to ask this is that this makes the string
> > > > that we want to send downstream much longer than the actual
> > > > statement given by the user on the publisher.
> > > >
> > >
> > > After thinking some more on this, it seems the user may want to
> > > optionally change some of these attributes, for example, on the
> > > subscriber, it may want to associate the table with a different
> > > tablespace. I think to address that we can append these additional
> > > attributes optionally, say via an additional parameter
> > > (append_all_options/append_all_attributes or something like that) in
> exposed APIs like deparse_utility_command().
> >
> > I agree and will research this part.
> >
>
> Okay, note that similar handling would be required at other places like
> deparse_ColumnDef. Few other comments on
> v9-0001-Functions-to-deparse-DDL-commands.
>
> 1.
> +static ObjElem *new_bool_object(bool value); static ObjElem
> +*new_string_object(char *value); static ObjElem
> +*new_object_object(ObjTree *value); static ObjElem
> +*new_array_object(List *array); static ObjElem
> +*new_integer_object(int64 value); static ObjElem
> +*new_float_object(float8 value);
>
> Here, new_object_object() seems to be declared out-of-order (not in sync with
> its order of definition). Similarly, see all other append_* functions.

Changed.

> 2. The function printTypmod in ddl_deparse.c appears to be the same as the
> function with the same name in format_type.c. If so, isn't it better to have a
> single copy of that function?

Changed.

> 3. As I pointed out yesterday, there is some similarity between
> format_type_extended and format_type_detailed. Can we try to extract
> common functionality? If this is feasible, it is better to do this as a separate
> patch. Also, this can obviate the need to expose printTypmod from
> format_type.c. I am not really sure if this will be better than the current one or
> not but it seems worth trying.

It seems the main logic in the two functions is a bit different. For
format_type_detailed, we always try to schema qualify both the user-defined and
built-in type except for some special type(time ..) which we cannot schema
qualify. But in format_type_extended, we don't try to schema qualify the
built-in type. But I will research a bit more about this.

> 4.
> new_objtree_VA()
> {
> ...
> switch (type)
> + {
> + case ObjTypeBool:
> + elem = new_bool_object(va_arg(args, int)); break; case ObjTypeString:
> + elem = new_string_object(va_arg(args, char *)); break; case
> + ObjTypeObject:
> + elem = new_object_object(va_arg(args, ObjTree *)); break; case
> + ObjTypeArray:
> + elem = new_array_object(va_arg(args, List *)); break; case
> + ObjTypeInteger:
> + elem = new_integer_object(va_arg(args, int64)); break; case
> + ObjTypeFloat:
> + elem = new_float_object(va_arg(args, double)); break; case
> + ObjTypeNull:
> + /* Null params don't have a value (obviously) */ elem =
> + new_null_object();
> ...
>
> I feel here ObjType's should be handled in the same order as they are defined in
> ObjType.

Changed.

> 5. There appears to be common code among node_*_object() functions.
> Can we just have one function instead new_object(ObjType, void *val)?
> In the function based on type, we can typecast the value. Is there a reason why
> that won't be better than current one?

I tried to extract some common code into a new function new_object(ObjType,
char *name). I didn't use 'void *' as user could pass a wrong type of value which
we cannot detect which seems a little unsafe.

> 6.
> deparse_ColumnDef()
> {
> ...
> /* Composite types use a slightly simpler format string */
> + if (composite)
> + column = new_objtree_VA("%{name}I %{coltype}T %{collation}s", 3,
> + "type", ObjTypeString, "column", "name", ObjTypeString,
> + coldef->colname, "coltype", ObjTypeObject, new_objtree_for_type(typid,
> + typmod)); else column = new_objtree_VA("%{name}I %{coltype}T
> + %{default}s
> %{not_null}s %{collation}s",
> + 3,
> + "type", ObjTypeString, "column",
> + "name", ObjTypeString, coldef->colname, "coltype", ObjTypeObject,
> + new_objtree_for_type(typid, typmod));
> ...
> }
>
> To avoid using the same arguments, we can define fmtstr for composite and
> non-composite types as the patch is doing in deparse_CreateStmt().

Changed.

> 7. It is not clear from comments or otherwise what should be considered for
> default format string in functions like
> deparse_ColumnDef() or deparse_CreateStmt().

I think it was intended to put most of the modifiable part in default format
string so that user can change what they want using json function. But I think
we could improve, maybe by passing a parameter which decide whether to keep
these keywords in default format string. I will research this.

> 8.
> + * FIXME --- actually, what about default values?
> + */
> +static ObjTree *
> +deparse_ColumnDef_typed(Relation relation, List *dpcontext, ColumnDef
> +*coldef)
>
> I think we need to handle default values for this FIXME.

Changed.

Attach the new version patch which address the above
comments and comments from [1][2].

I haven't addressed the latest comments from Vignesh[3],
I will investigate it.

[1] https://www.postgresql.org/message-id/CAA4eK1K88SMoBq%3DDRA4XU-F3FG6qyzCjGMMKsPpcRBPRcrELrw%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CALDaNm3rEA_zmnDMOCT7NqK4aAffhAgooLf8rXjUN%3DYwA8ASFw%40mail.gmail.com
[3] https://www.postgresql.org/message-id/CALDaNm2nFPMxUo%3D0zRUUA-v3_eRwRY%2Bii5nnG_PU%2B6jT7ta9dA%40mail.gmail.com

Best regards,
Hou zj

Attachment	Content-Type	Size
v11-0004-Test-cases-for-DDL-replication.patch	application/octet-stream	22.5 KB
v11-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	131.9 KB
v11-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	application/octet-stream	14.8 KB
v11-0002-Support-DDL-replication.patch	application/octet-stream	125.6 KB

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-07-08 02:26:28
Message-ID:	CAHut+Ps9QyGw4KRFP50vRnB0tJKbB_TS1E7rZ_-+pc2Nvwv_zw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Here are some review comments for the patch v11-0001:

======

1. Commit message

This provides base for logical replication of DDL statements. Currently,
this provides support for:

SUGGESTION
This provides a base for logical replication of DDL statements. Currently,
the patch has support for:

======

2. src/backend/commands/ddl_deparse.c - <general>

I noticed that some of these function header comments have periods and
so do not. Please add a period to every one of them for consistency.

~~~

3. src/backend/commands/ddl_deparse.c - <general>

There are quite a few functions in this file with no function comment.
Probably every function should have a comment.

List includes at least all these:

- deparse_ColumnIdentity
- RelationGetPartitionBound
- deparse_AlterOwnerStmt
- deparse_RenameStmt
- deparse_Seq_Cache
- deparse_Seq_Cycle
- deparse_Seq_IncrementBy
- deparse_Seq_Minvalue
- deparse_Seq_Maxvalue
- deparse_Seq_Startwith
- deparse_Seq_Restart
- deparse_Seq_OwnedBy
- deparse_AlterTableStmt
- deparse_CreateTableAsStmt
- deparse_drop_sequence
- deparse_drop_schema
- deparse_drop_index
- deparse_drop_table

~~~

4. src/backend/commands/ddl_deparse.c - <general>

Lots of places are making calls to the new_objtree_VA function but
some of them are a bit messy. I think the wrapping of the args to that
function needs to be revisited and made consistent indentation
everywhere to make them all easier to read. IMO it is easier when the
number of arg-groups is clear and each arg-group is on a new line.
Like this example:

column = new_objtree_VA("%{name}I WITH OPTIONS %{default}s %{not_null}s",
2,
"type", ObjTypeString, "column",
"name", ObjTypeString,
coldef->colname);

~~~

5. src/backend/commands/ddl_deparse.c - <general>

Lots of the function comments are giving the function name again. It
does not seem necessary.

e.g (there many more are like this)

BEFORE
/* deparse_CreateSeqStmt
* deparse a CreateSeqStmt
*
* Given a sequence OID and the parsetree that create it, return an ObjTree
* representing the creation command.
*/

SUGGEST
/*
* Deparse a CreateSeqStmt
*
* Given a sequence OID and the parsetree that create it, return an ObjTree
* representing the creation command.
*/

~~~

6. src/backend/commands/ddl_deparse.c - typedefs

6a.
+} ObjType;

Shouldn't this be added to typedefs.list?

6b.
+typedef struct ObjTree

Ditto.

6c.
+typedef struct ObjElem

Ditto

~~~

7. src/backend/commands/ddl_deparse.c - format_type_detailed

+ }
+ *nspid = InvalidOid;
+
+ if (typemod >= 0)
+ *typemodstr = printTypmod("", typemod, typeform->typmodout);
+ else
+ *typemodstr = pstrdup("");
+
+ ReleaseSysCache(tuple);
+ return;
+ }
+
+ /*
+ * No additional processing is required for other types, so get the type
+ * name and schema directly from the catalog.
+ */
+ *nspid = typeform->typnamespace;
+ *typname = pstrdup(NameStr(typeform->typname));
+
+ if (typemod >= 0)
+ *typemodstr = printTypmod("", typemod, typeform->typmodout);
+ else
+ *typemodstr = pstrdup("");
+
+ ReleaseSysCache(tuple);
+}

The code can be simplified a bit by using if/else because the part:

+ if (typemod >= 0)
+ *typemodstr = printTypmod("", typemod, typeform->typmodout);
+ else
+ *typemodstr = pstrdup("");
+
+ ReleaseSysCache(tuple);

is common code.

~~~

8. src/backend/commands/ddl_deparse.c - append_bool_object

Why this function has no assert like the others do?

+ Assert(name);

~~~

9. src/backend/commands/ddl_deparse.c - append_array_object

Why this function has no assert like the others do?

+ Assert(name);

~~~

10. src/backend/commands/ddl_deparse.c - new_objtree_for_type

+ if (!OidIsValid(typnspid))
+ typnsp = pstrdup("");
+ else
+ typnsp = get_namespace_name_or_temp(typnspid);

Reversing this if/else will give slight simpler code

~~~

11. src/backend/commands/ddl_deparse.c - deparse_ColumnIdentity

+ ObjTree *tmp;

"tmp" doesn’t seem a very good variable name since that is also what
the function is returning.

~~~

12.

+ /* definition elemets */

Uppercase comment.

~~~

13.

+ /* we purposefully do not emit OWNED BY here */

Uppercase comment.

~~~

14. src/backend/commands/ddl_deparse.c - deparse_ColumnDef

+ /*
+ * Inherited columns without local definitions must not be emitted. XXX --
+ * maybe it is useful to have them with "present = false" or some such?
+ */

I think the XXX should be on a newline otherwise the note just gets
lost in the comment.

~~~

15.

+ if (saw_notnull)
+ append_string_object(column, "not_null", "NOT NULL");
+ else
+ append_string_object(column, "not_null", "");

Perhaps simple code like this is more neatly written as:

append_string_object(column, "not_null", saw_notnull ? "NOT NULL" : "");

~~~

16. src/backend/commands/ddl_deparse.c - deparse_ColumnDef_typed

+ if (saw_notnull)
+ append_string_object(column, "not_null", "NOT NULL");
+ else
+ append_string_object(column, "not_null", "");

Ditto previous comment #15.

~~~

17. src/backend/commands/ddl_deparse.c - deparseTableElements

Should the function name be "deparse_TableElements" to match the
pattern of all the others?

~~~

18. src/backend/commands/ddl_deparse.c - deparse_OnCommitClause

+ ObjTree *tmp;

I don’t think "tmp" is a good name here because the function is returning it.

~~~

19. src/backend/commands/ddl_deparse.c - deparse_DefElem

+ set = new_objtree_VA("%{label}s = %{value}L", 1,
+ "value", ObjTypeString,
+ elem->arg ? defGetString(elem) :
+ defGetBoolean(elem) ? "TRUE" : "FALSE");

The double ternary operators here are a bit hard to read. Maybe add
some extra parens just to improve the readability?

~~~

20. src/backend/commands/ddl_deparse.c - deparse_ColumnSetOptions

+ ObjTree *tmp;

I don’t think "tmp" is a very good name here because the function is
returning this.

~~~

21. src/backend/commands/ddl_deparse.c - deparse_RelSetOptions

+ ObjTree *tmp;

I don’t think "tmp" is a very good name here because the function is
returning this.

~~~

22 src/backend/commands/ddl_deparse.c - pg_get_indexdef_detailed

+
+
+ /*
+ * Fetch the pg_am tuple of the index' access method
+ */

Spurious whitespace line.

~~~

23.

+ /* output index AM */

Uppercase comment

~~~

24.

+ sep = "";
+ for (keyno = 0; keyno < idxrec->indnatts; keyno++)
+ {
+ AttrNumber attnum = idxrec->indkey.values[keyno];
+ int16 opt = indoption->values[keyno];
+ Oid keycoltype;
+ Oid keycolcollation;
+ Oid indcoll;
+
+ appendStringInfoString(&definitionBuf, sep);
+ sep = ", ";

This assignment of "sep" seemed a bit strange. Can't it be more
easilly written like:

appendStringInfoString(&definitionBuf, keyno == 0 ? "" : ", ");

~~~

25.

+ /* expressional index */

Uppercase comment.

~~~

26.

+ /* if it supports sort ordering, report DESC and NULLS opts */

Uppercase comment

~~~

27.

+ /* output tablespace */

Uppercase comment

~~~

28.

+ /* report index predicate, if any */

Uppercase comment

~~~

29.

+
+ /* all done */

Kind of redundant/meaningless. Consider removing this comment.

~~~

30. src/backend/commands/ddl_deparse.c - deparse_RenameStmt

+ switch (node->renameType)
+ {
+ case OBJECT_SCHEMA:
+ {
+ renameStmt =
+ new_objtree_VA("ALTER SCHEMA %{identity}I RENAME TO %{newname}I",
+ 0);
+ append_string_object(renameStmt, "identity", node->subname);
+ }
+ break;
+ default:
+ elog(ERROR, "unsupported object type %d", node->renameType);
+ }

The switch with single case seems a bit overkill here. Wouldn’t just
"if" be more appropriate?

~~~

31. src/backend/commands/ddl_deparse.c - deparse_CreateSeqStmt

This function looked very similar to the other function
deparse_ColumnIdentity. Is it worth trying to combine these or have
one of them just delegate to the other to reduce the cut/paste code?

~~~

32. src/backend/commands/ddl_deparse.c - deparse_AlterTableStmt

+ default:
+ elog(ERROR, "unexpected relkind %d", rel->rd_rel->relkind);
+ reltype = NULL;;

Does the assignment after the elog achieve anything?

~~~

33.

+ /* if no constraint was altered, silently skip it */

Uppercase comment

~~~

34.

+ /* add the TYPE clause */

Uppercase comment

~~~

35.

+ /* add a COLLATE clause, if needed */

Uppercase comment

~~~

36.

+ /* if not a composite type, add the USING clause */

Uppercase comment

~~~

37.

+ /* if it's a composite type, add the CASCADE clause */

Uppercase comment

~~~

38. src/backend/commands/ddl_deparse.c - deparse_drop_sequence

38a.

+ command = JsonbToCString(&str, &jsonb->root, 128);

Is there some more appropriate constant to use here instead of the
hardwired 128?

38b. deparse_drop_schema
ditto

38c. deparse_drop_index
ditto

38d. deparse_drop_table
ditto

~~~

39. src/backend/commands/ddl_deparse.c - ddl_deparse_to_json

+ if (command)
+ PG_RETURN_TEXT_P(CStringGetTextDatum(command));
+ else
+ PG_RETURN_NULL();

The 'else’ keyword is not needed. Just do PG_RETURN_NULL();

======

40. src/backend/commands/ddl_json.c - <general>

Many (but not all) of these comments (particularly the function header
comments) seem to have double blank spaces in them after periods. I
don’t think it is normal. Please remove the extra spaces

~~~

41. src/backend/commands/ddl_json.c - <general>

All the function header comment sentences should start with uppercase. E.g. in
many places:

"expand" -> "Expand"

~~~

42. src/backend/commands/ddl_json.c -

42.a
+typedef enum
+{
+ SpecTypename,
+ SpecOperatorname,
+ SpecDottedName,
+ SpecString,
+ SpecNumber,
+ SpecStringLiteral,
+ SpecIdentifier,
+ SpecRole
+} convSpecifier;

Add this to typedefs.list?

42.b
+typedef enum
+{
+ tv_absent,
+ tv_true,
+ tv_false
+} trivalue;

Add this to typedefs.list?

~~~

43. src/backend/commands/ddl_json.c - find_string_in_jsonbcontainer

+/*
+ * Given a JsonbContainer, find the JsonbValue with the given key name in it.
+ * If it's of a type other than jbvString, an error is raised. If it doesn't
+ * exist, an error is raised if missing_ok; otherwise return NULL.
+ *
+ * If it exists and is a string, a freshly palloc'ed copy is returned.
+ *
+ * If *length is not NULL, it is set to the length of the string.
+ */
+static char *
+find_string_in_jsonbcontainer(JsonbContainer *container, char *keyname,
+ bool missing_ok, int *length)

"an error is raised if missing_ok" --> I think this should say an
error is raised *unless* missing_ok

~~~

44.

+ if (value == NULL)
+ {
+ if (missing_ok)
+ return NULL;
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("missing element \"%s\" in json object", keyname)));
+ }

For some reason, it seems more intuitive IMO to handle the error case
first here. YMMV.

if (value == NULL)
{
if (!missing_ok)
ereport(...)

return NULL;
}

~~~

45. src/backend/commands/ddl_json.c - expand_fmt_recursive

+ /*
+ * found array separator delimiter; element name is now
+ * complete, start filling the separator.
+ */

Uppercase comment

~~~

46.

+ /* Validate that we got an array if the format string specified one. */
+
+ /* And finally print out the data */
+ if (is_array)

Something seems strange for these comments to be separated like this.

~~~

47 src/backend/commands/ddl_json.c - expand_jsonval_identifier

+ str = pnstrdup(jsonval->val.string.val,
+ jsonval->val.string.len);

Unnecessary wrapping

~~~

48. src/backend/commands/ddl_json.c - expand_jsonval_dottedname

+/*
+ * Expand a json value as a dot-separated-name. The value must be of type
+ * object and must contain elements "schemaname" (optional), "objname"
+ * (mandatory), "attrname" (optional). Double quotes are added to each element
+ * as necessary, and dot separators where needed.
+ *
+ * One day we might need a "catalog" element as well, but no current use case
+ * needs that.
+ */

Does it make sense to say "must" contain elements that are optional elements?

~~~

49. src/backend/commands/ddl_json.c - expand_jsonval_strlit

+ /* easy case: if there are no ' and no \, just use a single quote */

Uppercase comment

~~~

50.

+ dqnextchar %= sizeof(dqsuffixes) - 1;

This statement looks quite confusing. May add some parens for better reability?

~~~

51.

+ /* add trailing $ */

Uppercase comment.

~~~

52. src/backend/commands/ddl_json.c - expand_one_jsonb_element

+static bool
+expand_one_jsonb_element(StringInfo out, char *param, JsonbValue *jsonval,
+ convSpecifier specifier, const char *fmt)

In the other function the StringInfo buffer was called "buf" instead
of "out". Either one is OK but I think you should use consistent
naming for all the functions like this. Check all the function – not
just this one.

~~~

53. src/backend/commands/ddl_json.c - expand_jsonb_array

+ if ((container->header & JB_FARRAY) == 0)

I think there is a macro you should use designed exactly for this –
see JsonContainerIsArray(jc)

~~~

54.

+ switch (type)
+ {
+ case WJB_ELEM:
+ if (!first)
+ appendStringInfoString(out, arraysep);
+
+ if (expand_one_jsonb_element(out, param, &v, specifier, NULL))
+ first = false;
+ else
+ {
+ if (!first)
+ {
+ /* remove the array separator */
+ out->len -= arrayseplen;
+ out->data[out->len] = '\0';
+ }
+ }
+ break;
+ }

Why have a switch with just a single case?

~~~

55.

+ if (!first)
+ appendStringInfoString(out, arraysep);
+
+ if (expand_one_jsonb_element(out, param, &v, specifier, NULL))
+ first = false;
+ else
+ {
+ if (!first)
+ {
+ /* remove the array separator */
+ out->len -= arrayseplen;
+ out->data[out->len] = '\0';
+ }
+ }

It looks a bit strange to first write the separator to the buffer then
find that you shouldn't have written it so have to undo it. Is this
the best way that this can be handled? Maybe there is no choice.

~~~

56.

+ /* remove the array separator */

Uppercase comment

~~~~

57. src/backend/commands/ddl_json.c - ddl_deparse_json_to_string

+char *
+ddl_deparse_json_to_string(char *json_str)

Missing function comment

~~~

58. src/backend/commands/ddl_json.c - ddl_deparse_expand_command

+Datum
+ddl_deparse_expand_command(PG_FUNCTION_ARGS)

This function name seems a bit strange - see also comment #62

======

59. src/backend/commands/sequence.c - get_sequence_values

+ /* open and AccessShareLock sequence */

Uppercase comment

~~~

60.

+ retSeq = palloc(sizeof(FormData_pg_sequence_data));
+
+ /* open and AccessShareLock sequence */
+ init_sequence(sequenceId, &elm, &seqrel);
+
+ if (pg_class_aclcheck(sequenceId, GetUserId(),
+ ACL_SELECT | ACL_UPDATE | ACL_USAGE) != ACLCHECK_OK)
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("permission denied for sequence %s",
+ RelationGetRelationName(seqrel))));
+
+ seq = read_seq_tuple(seqrel, &buf, &seqtuple);
+
+ memcpy(retSeq, seq, sizeof(FormData_pg_sequence_data));

I thought maybe better if the palloc could be done later, paired with
the memcpy, since it is not needed before then.

======

61. src/backend/utils/adt/ruleutils.c - pg_get_partkeydef_simple

+char *
+pg_get_partkeydef_simple(Oid relid)

Missing function comment

======

62. src/include/catalog/pg_proc.dat

+{ oid => '4642', descr => 'ddl deparse',
+ proname => 'ddl_deparse_to_json', prorettype => 'text',
+ proargtypes => 'pg_ddl_command', prosrc => 'ddl_deparse_to_json' },
+{ oid => '4643', descr => 'json to string',
+ proname => 'ddl_deparse_expand_command', prorettype => 'text',
+ proargtypes => 'text', prosrc => 'ddl_deparse_expand_command' },

My 1st impressions was the name "ddl_deparse_expand_command" does not
see to reflext the description very well...

Maybe calling it something like "ddl_json_to_command" is more accurate?

======

63. src/include/utils/builtins.h

@@ -118,6 +118,7 @@ extern char *format_type_extended(Oid type_oid,
int32 typemod, bits16 flags);
extern char *format_type_be(Oid type_oid);
extern char *format_type_be_qualified(Oid type_oid);
extern char *format_type_with_typemod(Oid type_oid, int32 typemod);
+extern char *printTypmod(const char *typname, int32 typmod, Oid typmodout);

Notice that every of the format_type function name looks like
format_type_XXX. Not that you have change the printTypmod to be extern
then I woinder should the name also be changed (e.g.
format_type_printmod) to have the consistent function naming.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

From:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>
Subject:	RE: Support logical replication of DDLs
Date:	2022-07-12 09:03:33
Message-ID:	OS0PR01MB571642FDCB049FC00EBDA5EC94869@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Friday, July 8, 2022 10:26 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:

> Here are some review comments for the patch v11-0001:

Thanks for the comments.

> 4. src/backend/commands/ddl_deparse.c - <general>
>
> Lots of places are making calls to the new_objtree_VA function but some of them are a bit messy. I think the > wrapping of the args to that function needs to be revisited and made consistent indentation everywhere to make them > all easier to read. IMO it is easier when the number of arg-groups is clear and each arg-group is on a new line.
> Like this example:
>
> column = new_objtree_VA("%{name}I WITH OPTIONS %{default}s %{not_null}s",
> 2,
> "type", ObjTypeString, "column",
> "name", ObjTypeString,
> coldef->colname);

I think both your suggestion and the following style are fine to me.
new_objtree_VA(fmt, num,
xxobj1,
xxobj2 );

So, I only changed the other style function calls.

> 30. src/backend/commands/ddl_deparse.c - deparse_RenameStmt
>
> + switch (node->renameType)
> + {
> + case OBJECT_SCHEMA:
> + {
> + renameStmt =
> + new_objtree_VA("ALTER SCHEMA %{identity}I RENAME TO %{newname}I",
> + 0);
> + append_string_object(renameStmt, "identity", node->subname); } break;
> + default:
> + elog(ERROR, "unsupported object type %d", node->renameType); }
> The switch with single case seems a bit overkill here. Wouldn’t just "if" be
> more appropriate?

I think it was intended for other RENAME deparsing which haven't been added but
will be added to the patch later. So, I didn't change this.

> 31. src/backend/commands/ddl_deparse.c - deparse_CreateSeqStmt
>
> This function looked very similar to the other function deparse_ColumnIdentity.
> Is it worth trying to combine these or have one of them just delegate to the
> other to reduce the cut/paste code?

Since they are used to deparse different commands, one for (column identity)
another for (create sequence), so I think the current style is fine.

> 40. src/backend/commands/ddl_json.c - <general>
>
> Many (but not all) of these comments (particularly the function header
> comments) seem to have double blank spaces in them after periods. I don’t
> think it is normal. Please remove the > extra spaces

I think this style is fine as I can see them in many other existing comments.

> 44.
> + if (value == NULL)
> + {
> + if (missing_ok)
> + return NULL;
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("missing element \"%s\" in json object", keyname))); }
>
> For some reason, it seems more intuitive IMO to handle the error case first here. YMMV.

Not sure about this.

> 62. src/include/catalog/pg_proc.dat
>
> +{ oid => '4642', descr => 'ddl deparse',
> + proname => 'ddl_deparse_to_json', prorettype => 'text',
> + proargtypes => 'pg_ddl_command', prosrc => 'ddl_deparse_to_json' }, {
> +oid => '4643', descr => 'json to string',
> + proname => 'ddl_deparse_expand_command', prorettype => 'text',
> + proargtypes => 'text', prosrc => 'ddl_deparse_expand_command' },
>
> My 1st impressions was the name "ddl_deparse_expand_command" does not see to reflext the description very well...
>
> Maybe calling it something like "ddl_json_to_command" is more accurate ?

I think "expand command" is close to the usage of this function so didn't change the
name for now. But I adjust the description here.

> 63. src/include/utils/builtins.h
>
> @@ -118,6 +118,7 @@ extern char *format_type_extended(Oid type_oid,
> int32 typemod, bits16 flags);
> extern char *format_type_be(Oid type_oid); extern char *format_type_be_qualified(Oid type_oid); extern char > *format_type_with_typemod(Oid type_oid, int32 typemod);
> +extern char *printTypmod(const char *typname, int32 typmod, Oid
> +typmodout);
>
> Notice that every of the format_type function name looks like format_type_XXX.
> Not that you have change the printTypmod to be extern then I woinder should the
> name also be changed (e.g. format_type_printmod) to have the consistent
> function naming.

I am not sure about this as the current name looks fine to me.

I agreed with the other comments and addressed them on the new version patch.

Attach the V12 patch set which include the following changes:

* Address comments from peter[1]
* Refactor the deparser and provide an option to control whether output the "not present" syntax part.
And for DDL replication, we don't WAL log the "not present" syntax string for now.
* Address most comments from Vignesh[2] except the one about pg_dump
vs the event trigger for ddl replication which need some more research.

[1] https://www.postgresql.org/message-id/CAHut%2BPs9QyGw4KRFP50vRnB0tJKbB_TS1E7rZ_-%2Bpc2Nvwv_zw%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CALDaNm2nFPMxUo%3D0zRUUA-v3_eRwRY%2Bii5nnG_PU%2B6jT7ta9dA%40mail.gmail.com

Best regards,
Hou zj

Attachment	Content-Type	Size
v12-0004-Test-cases-for-DDL-replication.patch	application/octet-stream	22.5 KB
v12-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	134.2 KB
v12-0002-Support-DDL-replication.patch	application/octet-stream	123.2 KB
v12-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	application/octet-stream	15.0 KB

From:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>
Subject:	RE: Support logical replication of DDLs
Date:	2022-07-20 12:28:26
Message-ID:	OS0PR01MB5716FCCEF8009E4464042476948E9@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Tuesday, July 12, 2022 5:04 PM houzj(dot)fnst(at)fujitsu(dot)com <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> On Friday, July 8, 2022 10:26 AM Peter Smith <smithpb2250(at)gmail(dot)com>
> wrote:
>
>
> > Here are some review comments for the patch v11-0001:
>
> Attach the V12 patch set which include the following changes:
>
> * Address comments from peter[1]
> * Refactor the deparser and provide an option to control whether output the
> "not present" syntax part.
> And for DDL replication, we don't WAL log the "not present" syntax string for
> now.
> * Address most comments from Vignesh[2] except the one about pg_dump
> vs the event trigger for ddl replication which need some more research.

Attach the new version patch set which added support for CREATE/ALTER FUNCTION,
CREATE/ALTER PROCEDURE, CREATE/ALTER TRIGGER, RENAME TABLE/INDEX/SEQUENCE.

Thanks to Vignesh for helping off list.

Best regards,
Hou zj

Attachment	Content-Type	Size
v13-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	application/octet-stream	15.0 KB
v13-0002-Support-DDL-replication.patch	application/octet-stream	123.5 KB
v13-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	166.6 KB
v13-0004-Test-cases-for-DDL-replication.patch	application/octet-stream	22.5 KB

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Peter Smith <smithpb2250(at)gmail(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-07-22 21:18:56
Message-ID:	CAAD30ULBUSHG4n-_zZ7yLGPri-LkS0o+vp64WCKj-mFcF27ekg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Hello,

Here is a patch that supports replication of global object commands,
these include ROLE statements, database statements and tablespace statements.
The patch should be applied on top of the v13 DDL replication patch set that
ZJ Hou sent in the previous email.

Global objects commands are different from other DDL commands in
that:
1. Global objects commands are allowed to be executed in any databases
2. Global objects are not schema qualified
2. Global objects commands are not captured by event triggers

This patch supports global objects commands replication by WAL
logging the command using the same function for DDL logging -
LogLogicalDDLMessage, towards the end of standard_ProcessUtility.
Because global objects are not schema qualified, we can skip the deparser
invocation and directly log the original command string for replay on
the subscriber.

A key problem is global objects can get inconsistent between the
publisher and the subscriber if a command changes the global object
in a database (on the source side) which doesn't configure logical replication.
I think we can work on the following directions in order to avoid such
inconsistency:

1. Introduce a publication option for global objects command replication
and document that logical replication of global objects commands is preferred
to be configured on all databases. Otherwise inconsistency can happen
if a command changes the global object in a database which doesn't configure
logical replication.

2. Introduce database cluster level logical replication to avoid such
inconsistency,
this is especially handy when there is a large number of databases to
configure for logical
replication.

Regards,
Zheng

Attachment	Content-Type	Size
0005-Support-replication-of-global-object-commands-these-.patch	application/octet-stream	8.9 KB

From:	Joe Conway <mail(at)joeconway(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Peter Smith <smithpb2250(at)gmail(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-07-23 15:33:43
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On 7/22/22 17:18, Zheng Li wrote:
> Here is a patch that supports replication of global object commands,
> these include ROLE statements, database statements and tablespace statements.
> The patch should be applied on top of the v13 DDL replication patch set that
> ZJ Hou sent in the previous email.
>
> Global objects commands are different from other DDL commands in
> that:
> 1. Global objects commands are allowed to be executed in any databases
> 2. Global objects are not schema qualified
> 2. Global objects commands are not captured by event triggers
>
> This patch supports global objects commands replication by WAL
> logging the command using the same function for DDL logging -
> LogLogicalDDLMessage, towards the end of standard_ProcessUtility.
> Because global objects are not schema qualified, we can skip the deparser
> invocation and directly log the original command string for replay on
> the subscriber.

I have not looked at the patch but +1 for the general concept. Seems
like you might want to start a separate thread, perhaps after the
currently running commitfest is over.

> A key problem is global objects can get inconsistent between the
> publisher and the subscriber if a command changes the global object
> in a database (on the source side) which doesn't configure logical replication.
> I think we can work on the following directions in order to avoid such
> inconsistency:
>
> 1. Introduce a publication option for global objects command replication
> and document that logical replication of global objects commands is preferred
> to be configured on all databases. Otherwise inconsistency can happen
> if a command changes the global object in a database which doesn't configure
> logical replication.
>
> 2. Introduce database cluster level logical replication to avoid
> such inconsistency, this is especially handy when there is a large
> number of databases to configure for logical replication.

I would strongly favor #2, although I admittedly have no idea what
complexities it adds.

--
Joe Conway
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	Joe Conway <mail(at)joeconway(dot)com>
Cc:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Peter Smith <smithpb2250(at)gmail(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-07-23 16:23:51
Message-ID:	CAAD30U+RrK3cqMG-z11L5bU6RfqjPLZa3kHyi-35g=562Avafw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Sat, Jul 23, 2022 at 11:33 AM Joe Conway <mail(at)joeconway(dot)com> wrote:
>
> On 7/22/22 17:18, Zheng Li wrote:
> > Here is a patch that supports replication of global object commands,
> > these include ROLE statements, database statements and tablespace statements.
> > The patch should be applied on top of the v13 DDL replication patch set that
> > ZJ Hou sent in the previous email.
> >
> > Global objects commands are different from other DDL commands in
> > that:
> > 1. Global objects commands are allowed to be executed in any databases
> > 2. Global objects are not schema qualified
> > 2. Global objects commands are not captured by event triggers
> >
> > This patch supports global objects commands replication by WAL
> > logging the command using the same function for DDL logging -
> > LogLogicalDDLMessage, towards the end of standard_ProcessUtility.
> > Because global objects are not schema qualified, we can skip the deparser
> > invocation and directly log the original command string for replay on
> > the subscriber.
>
> I have not looked at the patch but +1 for the general concept. Seems
> like you might want to start a separate thread, perhaps after the
> currently running commitfest is over.

Thanks for the suggestion. I'll start a new thread on the replication
of global objects
commands. I think it's different enough to get its own attention.

>
> > A key problem is global objects can get inconsistent between the
> > publisher and the subscriber if a command changes the global object
> > in a database (on the source side) which doesn't configure logical replication.
> > I think we can work on the following directions in order to avoid such
> > inconsistency:
> >
> > 1. Introduce a publication option for global objects command replication
> > and document that logical replication of global objects commands is preferred
> > to be configured on all databases. Otherwise inconsistency can happen
> > if a command changes the global object in a database which doesn't configure
> > logical replication.
> >
> > 2. Introduce database cluster level logical replication to avoid
> > such inconsistency, this is especially handy when there is a large
> > number of databases to configure for logical replication.
>
> I would strongly favor #2, although I admittedly have no idea what
> complexities it adds.

I will also start a new thread once we have more concrete plans on this.

Regards,
Zheng

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Peter Smith <smithpb2250(at)gmail(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-07-25 06:13:36
Message-ID:	CAA4eK1++QWOyfEei3wvC8kFLzqy7oYg2125gGebFZsO+VbOE1A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Sat, Jul 23, 2022 at 2:49 AM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
>
> Hello,
>
> Here is a patch that supports replication of global object commands,
> these include ROLE statements, database statements and tablespace statements.
> The patch should be applied on top of the v13 DDL replication patch set that
> ZJ Hou sent in the previous email.
>
> Global objects commands are different from other DDL commands in
> that:
> 1. Global objects commands are allowed to be executed in any databases
> 2. Global objects are not schema qualified
> 2. Global objects commands are not captured by event triggers
>
> This patch supports global objects commands replication by WAL
> logging the command using the same function for DDL logging -
> LogLogicalDDLMessage, towards the end of standard_ProcessUtility.
>

I noticed that LogLogicalDDLMessage() uses MyDatabaseId and then
decoding can take some action (filtering) based on that. Is it Okay to
use that function for global objects, if so, you might want to add a
comment for the same?

> Because global objects are not schema qualified, we can skip the deparser
> invocation and directly log the original command string for replay on
> the subscriber.
>
> A key problem is global objects can get inconsistent between the
> publisher and the subscriber if a command changes the global object
> in a database (on the source side) which doesn't configure logical replication.
> I think we can work on the following directions in order to avoid such
> inconsistency:
>
> 1. Introduce a publication option for global objects command replication
> and document that logical replication of global objects commands is preferred
> to be configured on all databases. Otherwise inconsistency can happen
> if a command changes the global object in a database which doesn't configure
> logical replication.
>
> 2. Introduce database cluster level logical replication to avoid such
> inconsistency,
> this is especially handy when there is a large number of databases to
> configure for logical
> replication.
>

In general, I agree with your comment below that we can work on this
after we have some more concrete plans/discussions. I think we can
probably consider this when we have more discussion around the
publication commands for the DDL objects. However, it would be good if
you can add some more details about the above two options.

As per my understanding, the overall work on this project includes the
following sub-tasks:
a. DDL Deparsing: This is required to support DDL replication of
non-global objects. The work for this is in progress, this is based on
prior work by Alvaro.
b. DDL Replication: This includes replication of DDL commands based on
event triggers and DDL deparsing. The work on this is also in
progress.
c. DDL Replication of global objects: It requires a different approach
due to the reasons quoted above in your email. Zheng has started
working on it.
d. Initial Sync: I think there is a brief discussion about this in the
thread but no concrete proposal yet. I think it is important to solve
this part of the puzzle as well to have an overall design ready for
this project. Do let me know if you, Sawada-San, or anybody else
intends to work on it? I think that will avoid overlap of work.

--
With Regards,
Amit Kapila.

From:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>
Subject:	RE: Support logical replication of DDLs
Date:	2022-07-28 03:48:11
Message-ID:	OS0PR01MB57168B82B65B491596F5832894969@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Wednesday, July 20, 2022 8:28 PM houzj(dot)fnst(at)fujitsu(dot)com wrote:
> On Tuesday, July 12, 2022 5:04 PM houzj(dot)fnst(at)fujitsu(dot)com
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > On Friday, July 8, 2022 10:26 AM Peter Smith <smithpb2250(at)gmail(dot)com>
> > wrote:
> >
> >
> > > Here are some review comments for the patch v11-0001:
> >
> > Attach the V12 patch set which include the following changes:
> >
> > * Address comments from peter[1]
> > * Refactor the deparser and provide an option to control whether
> > output the "not present" syntax part.
> > And for DDL replication, we don't WAL log the "not present" syntax
> > string for now.
> > * Address most comments from Vignesh[2] except the one about pg_dump
> > vs the event trigger for ddl replication which need some more research.
>
> Attach the new version patch set which added support for CREATE/ALTER
> FUNCTION, CREATE/ALTER PROCEDURE, CREATE/ALTER TRIGGER, RENAME
> TABLE/INDEX/SEQUENCE.
>
> Thanks to Vignesh for helping off list.

Attach the new version patch set which added support for
CREATE/ALTER OPERATOR [CLASS/FAMILY].
The DROP case need some more handling and I will update it later.

Best regards,
Hou zj

Attachment	Content-Type	Size
v14-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	184.3 KB
v14-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	application/octet-stream	15.1 KB
v14-0002-Support-DDL-replication.patch	application/octet-stream	123.8 KB
v14-0004-Test-cases-for-DDL-replication.patch	application/octet-stream	22.5 KB

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-08-01 04:18:15
Message-ID:	CAHut+Pv=Ai80B7ObcaQa56hpf_HnMs=MaFiS9afkH8DtxDd6ng@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

FYI, I found that the v14-0001 patch does not currently apply [1]. Can
you please rebase it?

------
[1] http://cfbot.cputube.org/patch_38_3595.log

Kind Regards,
Peter Smith.
Fujitsu Australia

From:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>
Subject:	RE: Support logical replication of DDLs
Date:	2022-08-02 03:20:55
Message-ID:	OS0PR01MB5716009FDCCC0B50BCB14A99949D9@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Monday, August 1, 2022 12:18 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> FYI, I found that the v14-0001 patch does not currently apply [1]. Can you please
> rebase it?

Thanks for reporting. Here is the rebased version.

Best regards,
Hou zj

Attachment	Content-Type	Size
v15-0004-Test-cases-for-DDL-replication.patch	application/octet-stream	22.5 KB
v15-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	182.6 KB
v15-0002-Support-DDL-replication.patch	application/octet-stream	123.8 KB
v15-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	application/octet-stream	15.1 KB

From:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>
Subject:	RE: Support logical replication of DDLs
Date:	2022-08-04 02:51:59
Message-ID:	OS0PR01MB571633E6F582558AFC0BBB28949F9@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Thursday, July 28, 2022 11:48 AM houzj(dot)fnst(at)fujitsu(dot)com wrote:
> >
> > Attach the new version patch set which added support for CREATE/ALTER
> > FUNCTION, CREATE/ALTER PROCEDURE, CREATE/ALTER TRIGGER, RENAME
> > TABLE/INDEX/SEQUENCE.
> >
> > Thanks to Vignesh for helping off list.
>
> Attach the new version patch set which added support for CREATE/ALTER
> OPERATOR [CLASS/FAMILY].
> The DROP case need some more handling and I will update it later.

Attach the new version patch which added support for
DROP/RENAME OPERATOR/FUNCTION and
CREATE DOMAIN and
CREATE/ALTER/DROP CAST in deparser.

Best regards,
Hou zj

Attachment	Content-Type	Size
v16-0004-Test-cases-for-DDL-replication.patch	application/octet-stream	22.5 KB
v16-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	194.9 KB
v16-0002-Support-DDL-replication.patch	application/octet-stream	124.3 KB
v16-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	application/octet-stream	15.1 KB

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-08-05 06:03:29
Message-ID:	CAHut+PvpGCmqEodeGf8R=ajFmf-Bac_LPHeN1iPb2gAxXwzn3A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Hi Hou-san, here are my review comments for the patch v15-0001:

======

1. Commit Message

CREATE/ALTER/DROP TABLE (*)

At first, I thought "(*)" looks like a SQL syntax element.

SUGGESTION:

CREATE/ALTER/DROP TABLE - - Note #1, Note #2
...
Note #1 – blah blah
Note #2 – yada yada

======

2. src/backend/commands/ddl_deparse.c - General

2.1
Lots of the deparse_XXX function are in random places scattered around
in this module. Since there are so many, I think it's better to have
functions arrange alphabetically to make them easier to find. (And if
there are several functions that logically "belong" together then
those should be re-named so they will be group together
alphabetically...

Same applies to other functions – not just the deparse_XXX ones

2.2
There are lots of 'tmp' (or 'tmp2') variables in this file. Sometimes
'tmp' is appropriate (or maybe 'tmpobj' would be better) but in other
cases it seems like there should be a better name than 'tmp'. Please
search all these and replace where you can use a more meaningful name
than tmp.

2.3
Pointer NULL comparisons are not done consistently all through the
file. E.g. Sometimes you do tree->fmtinfo == NULL, but in others you
do like if (!tree->fmtinfo). It's no big deal whichever way you want
to use, but at least for the comparisons involving the same variables
IMO should use the same style consistently.

~~~

3. src/backend/commands/ddl_deparse.c - format_type_detailed

3.1
+ * - typename is set to the type name, without quotes

But the param is called 'typname', not 'typename'

3.2
+ * - typmod is set to the typemod, if any, as a string with parens

I think you mean:
"typmod is set" -> "typemodstr is set"

3.3
+ if (IsTrueArrayType(typeform) &&
+ typeform->typstorage != TYPSTORAGE_PLAIN)
+ {
+ /* Switch our attention to the array element type */
+ ReleaseSysCache(tuple);
+ tuple = SearchSysCache1(TYPEOID, ObjectIdGetDatum(array_base_type));
+ if (!HeapTupleIsValid(tuple))
+ elog(ERROR, "cache lookup failed for type %u", type_oid);
+
+ typeform = (Form_pg_type) GETSTRUCT(tuple);
+ type_oid = array_base_type;
+ *typarray = true;
+ }
+ else
+ *typarray = false;

Maybe this if/else can be simplified

*typarray = IsTrueArrayType(typeform) && typeform->typstorage !=
TYPSTORAGE_PLAIN;
if (*typarray)
{
...
}

3.4
+ /* otherwise, WITH TZ is added by typmod. */

Uppercase comment

~~~

4. src/backend/commands/ddl_deparse.c - append_object_to_format_string

+ for (cp = sub_fmt; cp < end_ptr; cp++)
+ {
+ if (*cp == '{')
+ {
+ start_copy = true;
+ continue;
+ }

What's this logic going to do if it encounters "{{" - it looks like it
will just keep going but wouldn't that be a name error to have a "{"
in it?

~~~

5. src/backend/commands/ddl_deparse.c - append_bool_object

+append_bool_object(ObjTree *tree, char *sub_fmt, bool value)
+{
+ ObjElem *param;
+ char *object_name = sub_fmt;
+ bool is_present_flag = false;
+
+ Assert(sub_fmt);
+
+ if (strcmp(sub_fmt, "present") == 0)
+ {
+ is_present_flag = true;
+ tree->present = value;
+ }
+
+ if (!verbose && !tree->present)
+ return;
+
+ if (!is_present_flag)
+ object_name = append_object_to_format_string(tree, sub_fmt);
+
+ param = new_object(ObjTypeBool, object_name);
+ param->value.boolean = value;
+ append_premade_object(tree, param);
+}

It feels like there is some subtle trickery going on here with the
conditions. Is there a simpler way to write this, or maybe it is just
lacking some explanatory comments?

~~~

6. src/backend/commands/ddl_deparse.c - append_array_object

+ if (!verbose)
+ {
+ ListCell *lc;
+
+ /* Extract the ObjElems whose present flag is true */
+ foreach(lc, array)
+ {
+ ObjElem *elem = (ObjElem *) lfirst(lc);
+
+ Assert(elem->objtype == ObjTypeObject);
+
+ if (!elem->value.object->present)
+ array = foreach_delete_current(array, lc);
+ }
+
+ if (list_length(array) == 0)
+ return;
+ }

Maybe it is OK as-is. I'm just wondering if this list_length(array)
check should be outside of the !verbose check?

~~~

7. src/backend/commands/ddl_deparse.c - objtree_to_jsonb_element

+ case ObjTypeObject:
+ /* recursively add the object into the existing parse state */

Uppercase comment

~~~

8. src/backend/commands/ddl_deparse.c - new_objtree_for_qualname_id

8.1
+ *
+ * Elements "schemaname" and "objname" are set. If the object is a temporary
+ * object, the schema name is set to "pg_temp".

I'm not sure if this is the right place to say this, since it is not
really this function that sets that "pg_temp".

8.2
+ if (isnull)
+ elog(ERROR, "unexpected NULL namespace");
+ objname = heap_getattr(catobj, Anum_name, RelationGetDescr(catalog),
+ &isnull);

Missing blank line after the elog?

~~~

9. src/backend/commands/ddl_deparse.c - deparse_ColumnIdentity

+ /* Definition elemets */

typo "elemets"

~~~

10. src/backend/commands/ddl_deparse.c - deparse_CreateTrigStmt

10.1
+ else
+ elog(ERROR, "unrecognized trigger timing value %d", node->timing);

should that say "unrecognized trigger timing type" (e.g. type instead of value)

10.2
+ /*
+ * Decode the events that the trigger fires for. The output is a list;
+ * in most cases it will just be a string with the even name, but when
+ * there's an UPDATE with a list of columns, we return a JSON object.
+ */

"even name" -> "event name" ?

10.3
+ foreach(cell, node->columns)
+ {
+ char *colname = strVal(lfirst(cell));
+
+ cols = lappend(cols,
+ new_string_object(colname));
+ }

Unnecessary wrapping?

10.4
+ append_array_object(update, "%{columns:, }I", cols);
+
+ events = lappend(events,
+ new_object_object(update));

Unnecessary wrapping?

10.5
+ /* verify that the argument encoding is correct */

Uppercase comment

~~~

11. src/backend/commands/ddl_deparse.c - deparse_ColumnDef

+ saw_notnull = false;
+ foreach(cell, coldef->constraints)
+ {
+ Constraint *constr = (Constraint *) lfirst(cell);
+
+ if (constr->contype == CONSTR_NOTNULL)
+ saw_notnull = true;
+ }

Some similar functions here would 'break' when it finds the
saw_notnull. Why not break here?

~~~

12. src/backend/commands/ddl_deparse.c - obtainConstraints

12.1
+ /* only one may be valid */

Uppercase comment

12.2
+ /*
+ * scan pg_constraint to fetch all constraints linked to the given
+ * relation.
+ */

Uppercase comment

~~~

13. src/backend/commands/ddl_deparse.c - deparse_InhRelations

+/*
+ * Deparse the inherits relations.
+ *
+ * Given a table OID, return a schema qualified table list representing
+ * the parent tables.
+ */

I am not sure - should that say "Deparse the INHERITS relations." (uppercase)

~~~

14. src/backend/commands/ddl_deparse.c - pg_get_indexdef_detailed

14.1
+ * There is a huge lot of code that's a dupe of pg_get_indexdef_worker, but
+ * control flow is different enough that it doesn't seem worth keeping them
+ * together.
+ */

SUGGESTION
A large amount of code is duplicated from pg_get_indexdef_worker, ...

14.2
+ idxrelrec = (Form_pg_class) GETSTRUCT(ht_idxrel);
+
+
+ /*
+ * Fetch the pg_am tuple of the index' access method
+ */

Remove the extra blank line

~~~

15. src/backend/commands/ddl_deparse.c - deparse_IndexStmt

+ /*
+ * indexes for PRIMARY KEY and other constraints are output
+ * separately; return empty here.
+ */

Uppercase comment

~~~

16. src/backend/commands/ddl_deparse.c - deparse_FunctionSet

16.1
+static ObjTree *
+deparse_FunctionSet(VariableSetKind kind, char *name, char *value)

Missing function comment.

16.2
+ ObjTree *r;

Is 'r' the best name?

16.3
+ if (kind == VAR_RESET_ALL)
+ {
+ r = new_objtree("RESET ALL");
+ }
+ else if (value != NULL)
+ {
+ r = new_objtree_VA("SET %{set_name}I", 1,
+ "set_name", ObjTypeString, name);
+
+ /*
+ * Some GUC variable names are 'LIST' type and hence must not be
+ * quoted.
+ */
+ if (GetConfigOptionFlags(name, true) & GUC_LIST_QUOTE)
+ append_string_object(r, "TO %{set_value}s", value);
+ else
+ append_string_object(r, "TO %{set_value}L", value);
+ }
+ else
+ {
+ r = new_objtree("RESET");
+ append_string_object(r, "%{set_name}I", name);
+ }

It seems a bit strange that the kind of new_objtree is judged
sometimes by the *value. Why isn't this just always switching on the
different VariableSetKind?

~~~

17. src/backend/commands/ddl_deparse.c - deparse_CreateFunction

17.1
+/*
+ * deparse_CreateFunctionStmt
+ * Deparse a CreateFunctionStmt (CREATE FUNCTION)
+ *
+ * Given a function OID and the parsetree that created it, return the JSON
+ * blob representing the creation command.
+ */
+static ObjTree *
+deparse_CreateFunction(Oid objectId, Node *parsetree)

The name of the function and the name of the function in the comment
don't match. Suggest removing it from the comment.

17.2
+ /* get the pg_proc tuple */

Uppercase comment

17.3
+ /* get the corresponding pg_language tuple */

Uppercase comment

17.4
+ /* optional wholesale suppression of "name" occurs here */

Uppercase comment

17.5
+ append_string_object(createFunc, "%{volatility}s",
+ procForm->provolatile == PROVOLATILE_VOLATILE ?
+ "VOLATILE" :
+ procForm->provolatile == PROVOLATILE_STABLE ?
+ "STABLE" :
+ procForm->provolatile == PROVOLATILE_IMMUTABLE ?
+ "IMMUTABLE" : "INVALID VOLATILITY");

Does "INVALID VOLATILITY" make sense? Is that a real thing or should
this give ERROR?

17.6
+ foreach(cell, node->options)
+ {
+ DefElem *defel = (DefElem *) lfirst(cell);
+ ObjTree *tmp = NULL;

Why assign *tmp = NULL? I think it serves no purpose.

~~~

18. src/backend/commands/ddl_deparse.c - deparse_AlterFunction

18.1
+ * Deparse a AlterFunctionStmt (ALTER FUNCTION)

"a" -> "an"

18.2
+ /* get the pg_proc tuple */

Uppercase comment

18.3
+ alterFunc = new_objtree_VA("ALTER FUNCTION", 0);
+
+ params = NIL;

Why not just assign params = NIL at the declaration?

~~~

19. src/backend/commands/ddl_deparse.c - deparse_AlterOwnerStmt

+ * Deparse a AlterOwnerStmt (ALTER ... OWNER TO ...).

"a" -> "an"

~~~

20. src/backend/commands/ddl_deparse.c - deparse_AlterOperatorStmt

+ * Deparse a AlterOperatorStmt (ALTER OPERATOR ... SET ...).

"a" -> "an"

~~~

21. src/backend/commands/ddl_deparse.c - deparse_RenameStmt

21.1
+ * In a ALTER .. RENAME command, we don't have the original name of the

"a" -> "an"

21.2
+ relation_close(relation, AccessShareLock);
+
+ break;
+ case OBJECT_SCHEMA:

Misplaced bank line; should be before after break;

21.3
+ break;
+ case OBJECT_TRIGGER:

Put blank line after break;

21.4
+ break;
+ default:

Put blank line after break;

~~~

22. src/backend/commands/ddl_deparse.c - deparse_Seq_Cache

+static inline ObjElem *
+deparse_Seq_Cache(ObjTree *parent, Form_pg_sequence seqdata, bool alter_table)

Is param 'alter_table’ correct here or should that be 'alter_sequence'
(or just 'alter')?

23. src/backend/commands/ddl_deparse.c - deparse_Seq_Cycle

+static inline ObjElem *
+deparse_Seq_Cycle(ObjTree *parent, Form_pg_sequence seqdata, bool alter_table)

Is param 'alter_table’ correct here or should that be 'alter_sequence'
(or just 'alter')?

~~~

24. src/backend/commands/ddl_deparse.c - deparse_Seq_IncrementBy

+static inline ObjElem *
+deparse_Seq_IncrementBy(ObjTree *parent, Form_pg_sequence seqdata,
bool alter_table)

Is param 'alter_table’ correct here or should that be 'alter_sequence'
(or just 'alter')?

~~~

25. src/backend/commands/ddl_deparse.c - deparse_Seq_Minvalue

+static inline ObjElem *
+deparse_Seq_Minvalue(ObjTree *parent, Form_pg_sequence seqdata, bool
alter_table)

Is param 'alter_table’ correct here or should that be 'alter_sequence'
(or just 'alter')?

~~~

26. src/backend/commands/ddl_deparse.c - deparse_Seq_Maxvalue

+static inline ObjElem *
+deparse_Seq_Maxvalue(ObjTree *parent, Form_pg_sequence seqdata, bool
alter_table)

Is param 'alter_table’ correct here or should that be 'alter_sequence'
(or just 'alter')?

~~~

27. src/backend/commands/ddl_deparse.c - deparse_Seq_Startwith

+static inline ObjElem *
+deparse_Seq_Startwith(ObjTree *parent, Form_pg_sequence seqdata, bool
alter_table)

Is param 'alter_table’ correct here or should that be 'alter_sequence'
(or just 'alter')?

~~~

28. src/backend/commands/ddl_deparse.c - deparse_CreateSchemaStmt

+/*
+ * Deparse a CreateSchemaStmt.
+ *
+ * Given a schema OID and the parsetree that created it, return an ObjTree
+ * representing the creation command.
+ *
+ * Note we don't output the schema elements given in the creation command.
+ * They must be output separately. (In the current implementation,
+ * CreateSchemaCommand passes them back to ProcessUtility, which will lead to
+ * this file if appropriate.)
+ */

"this file" ??

~~~

29. src/backend/commands/ddl_deparse.c - deparse_Seq_OwnedBy

29.1
+static ObjElem *
+deparse_Seq_OwnedBy(ObjTree *parent, Oid sequenceId, bool alter_table)

Missing function comment.

29.2
+ /* only consider AUTO dependencies on pg_class */

Uppercase comment.

29.3
+ if (!ownedby)
+ /* XXX this shouldn't happen ... */
+ ownedby = new_objtree_VA("OWNED BY %{owner}D",
+ 3,
+ "clause", ObjTypeString, "owned",
+ "owner", ObjTypeNull,
+ "present", ObjTypeBool, false);
+ return new_object_object(ownedby);

Put blank line before return;

~~~

30. src/backend/commands/ddl_deparse.c - deparse_CreateSeqStmt

30.1
+ /* definition elemets */

Uppercase comment, and typo "elemets"

30.2
+ /* we purposefully do not emit OWNED BY here */

Uppercase comment

~~~

31. src/backend/commands/ddl_deparse.c - deparse_AlterTableStmt

31.1
+ tmp = new_objtree_VA("ADD CONSTRAINT %{name}I",
+ 2,
+ "type", ObjTypeString, "add constraint using index",
+ "name", ObjTypeString, get_constraint_name(constrOid));

I think it was customary for you to put the number of varags on the
1st line, not like this. There are several others like this in this
function which should also be changed (where they fit OK on the first
line).

31.2
+ append_array_object(alterTableStmt, "%{subcmds:, }s", subcmds);
+ return alterTableStmt;

Maybe add blank line before the return;

~~~

32. src/backend/commands/ddl_deparse.c - deparse_AlterOpFamily

32.1
+ list = NIL;
+ foreach(cell, cmd->d.opfam.operators)

Why not assign list = NIL with the declaration?

32.2
+ proargtypes = procForm->proargtypes.values;
+ arglist = NIL;
+ for (i = 0; i < procForm->pronargs; i++)

Why not assign arglist = NIL with the declaration?

32.3
+ if (!stmt->isDrop)
+ append_format_string(alterOpFam, "ADD");
+ else
+ append_format_string(alterOpFam, "DROP");

Maybe simpler to reverse these; IMO isDrop means "DROP" makes more sense.

~~~

33. src/backend/commands/ddl_deparse.c - deparse_DefineStmt_Operator

+ /* Add the definition clause */
+ list = NIL;

Why not assign list = NIL with the declaration?

~~~

34. src/backend/commands/ddl_deparse.c - deparse_DefineStmt

+ default:
+ elog(ERROR, "unsupported object kind");
+ return NULL;

What purpose does this return serve after the ERROR? If you have to do
something to quieten the compiler, maybe it's better to set defStmt =
NULL at declaration.

======

35. src/backend/commands/ddl_json.c - <General>

In the errmsg text some of the messages say JSON (uppercase) and some
say json (lowercase). IMO they should all be consistent. Maybe say
JSON, since that way seems dominant.

~~~

36. src/backend/commands/ddl_json.c - enum

+typedef enum
+{
+ tv_absent,
+ tv_true,
+ tv_false
+} trivalue;

It seems there is another enum elsewhere already called this, because
you did not add it into the typedefs.list, yet it is already there. Is
that OK? Maybe this should have a unique name for this module.

~~~

37. src/backend/commands/ddl_json.c - expand_fmt_recursive

37.1
+ is_array = false;
+
+ ADVANCE_PARSE_POINTER(cp, end_ptr);

Why not assign is_array = false in the declaration?

37.2
+ /*
+ * Validate that we got an array if the format string specified one.
+ * And finally print out the data
+ */
+ if (is_array)
+ expand_jsonb_array(buf, param, value, arraysep, specifier, start_ptr);
+ else
+ expand_one_jsonb_element(buf, param, value, specifier, start_ptr);

"Print out" the data? This comment seems a bit over-complicated.
Perhaps these sentences can be combined and re-worded a bit.

SUGGESTION (maybe?)
Expand the data (possibly an array) into the output StringInfo.

~~~

38. src/backend/commands/ddl_json.c - expand_jsonval_identifier

+/*
+ * Expand a json value as an identifier. The value must be of type string.
+ */
+static void
+expand_jsonval_identifier(StringInfo buf, JsonbValue *jsonval)

Should that say "as a quoted identifier" ?

~~~

39. src/backend/commands/ddl_json.c - expand_jsonval_typename

39.1
+ switch (is_array)
+ {
+ default:
+ case tv_absent:

It seems slightly unusual for the default case to not be the last
switch case. Consider rearranging it.

39.2
+ if (schema == NULL)
+ appendStringInfo(buf, "%s%s%s",
+ quote_identifier(typename),
+ typmodstr ? typmodstr : "",
+ array_decor);
+
+ /* Special typmod needs */
+ else if (schema[0] == '\0')
+ appendStringInfo(buf, "%s%s%s",
+ typename,
+ typmodstr ? typmodstr : "",
+ array_decor);
+ else
+ appendStringInfo(buf, "%s.%s%s%s",
+ quote_identifier(schema),
+ quote_identifier(typename),
+ typmodstr ? typmodstr : "",
+ array_decor);

The last 2 parts:
typmodstr ? typmodstr : "",
array_decor);

are common for all those above appendStringInfo, so you could reduce
the code (if you want to) and just add the common parts at the end.

e.g.

if (schema == NULL)
appendStringInfo(buf, "%s", quote_identifier(typename));
else if (schema[0] == '\0')
appendStringInfo(buf, "%s", typename); /* Special typmod needs */
else
appendStringInfo(buf, "%s.%s", quote_identifier(schema),
quote_identifier(typename));

appendStringInfo(buf, "%s%s", typmodstr ? typmodstr : "", array_decor);

39.3
In other code (e.g. expand_jsonval_dottedname) you did lots of
pfree(str) after using the strings, so why not similar here?

~~~

40. src/backend/commands/ddl_json.c - expand_jsonval_operator

40.1
+ /* schema might be NULL or empty */

Uppercase comment

40.2
Why no pfree(str) here similar to what there was in prior code (e.g.
expand_jsonval_dottedname)?

~~~

41. src/backend/commands/ddl_json.c - expand_jsonval_string

Comment says "The value must be of type string or of type object."

Yeah, but what it is isn't? This code will just fall thru and return
true. Is that the right behaviour? Should there be an Assert at least?

~~~

42. src/backend/commands/ddl_json.c - expand_jsonval_number

Does this need some pfree after the string is copied to 'buf'?

~~~

43 src/backend/commands/ddl_json.c - expand_jsonval_role

+ rolename = find_string_in_jsonbcontainer(jsonval->val.binary.data,
+ "rolename", false, NULL);
+ appendStringInfoString(buf, quote_identifier(rolename));

Does this need some pfree after the string is copied to 'buf'?

~~~

44. src/backend/commands/ddl_json.c - ddl_deparse_json_to_string

+ d = DirectFunctionCall1(jsonb_in,
+ PointerGetDatum(json_str));

Seems unnecessary wrapping here.

~~~

45. src/backend/commands/ddl_json.c - fmtstr_error_callback

45.1
+/*
+ * Error context callback for JSON format string expansion.
+ *
+ * Possible improvement: indicate which element we're expanding, if applicable.
+ */

Should that "Possible improvement" comment have "XXX" prefix like most
other possible improvement comments have?

45.2
+fmtstr_error_callback(void *arg)
+{
+ errcontext("while expanding format string \"%s\"", (char *) arg);
+
+}

Remove the blank line.

======

46. src/backend/utils/adt/ruleutils.c - pg_get_trigger_whenclause

+char *
+pg_get_trigger_whenclause(Form_pg_trigger trigrec, Node *whenClause,
bool pretty)

Missing function comment

~~~

47. src/backend/utils/adt/ruleutils.c - print_function_sqlbody

@@ -3513,7 +3526,7 @@ pg_get_function_arg_default(PG_FUNCTION_ARGS)
PG_RETURN_TEXT_P(string_to_text(str));
}

-static void
+void
print_function_sqlbody(StringInfo buf, HeapTuple proctup)
{

Having a function comment is more important now that this is no longer static.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Peter Smith <smithpb2250(at)gmail(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-08-08 20:22:33
Message-ID:	CAAD30UL6CVphMN3Fn3iwfU=oh0GOasygxM19ZCzB6o+RZkD49Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

> In general, I agree with your comment below that we can work on this
> after we have some more concrete plans/discussions. I think we can
> probably consider this when we have more discussion around the
> publication commands for the DDL objects. However, it would be good if
> you can add some more details about the above two options.

Thanks for the support. I have started a new thread on supporting replication of
global object command and have added more detailed discussion:
https://www.postgresql.org/message-id/CAAD30UKD7YPEbYcs_L9PYLcLZjnxyqO%3DJF5_mnAwx7g_PtOi3A%40mail.gmail.com

> I noticed that LogLogicalDDLMessage() uses MyDatabaseId and then
> decoding can take some action (filtering) based on that. Is it Okay to
> use that function for global objects, if so, you might want to add a
> comment for the same?

Could you elaborate on the concern? The dbid filtering happens in
logicalddlmsg_decode but I don't see why I can't use LogLogicalDDLMessage
to log global commands. Are there any global commands that are
non-transactional?
logicalddlmsg_decode:
if (message->dbId != ctx->slot->data.database ||
FilterByOrigin(ctx, origin_id))
return;

> As per my understanding, the overall work on this project includes the
> following sub-tasks:
> a. DDL Deparsing: This is required to support DDL replication of
> non-global objects. The work for this is in progress, this is based on
> prior work by Alvaro.
> b. DDL Replication: This includes replication of DDL commands based on
> event triggers and DDL deparsing. The work on this is also in
> progress.
> c. DDL Replication of global objects: It requires a different approach
> due to the reasons quoted above in your email. Zheng has started
> working on it.
> d. Initial Sync: I think there is a brief discussion about this in the
> thread but no concrete proposal yet. I think it is important to solve
> this part of the puzzle as well to have an overall design ready for
> this project. Do let me know if you, Sawada-San, or anybody else
> intends to work on it? I think that will avoid overlap of work.

Euler mentioned that he has plan to work on the initial schema sync in
[1]. We can help with this effort as well.

Regards,
Zheng

[1] https://www.postgresql.org/message-id/45d0d97c-3322-4054-b94f-3c08774bbd90%40www.fastmail.com

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Peter Smith <smithpb2250(at)gmail(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-08-12 10:32:35
Message-ID:	CAA4eK1+Dm3RU03Xgim4N5-vVADzrvKpetFrv3o0c5EKmnEy5rg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Tue, Aug 9, 2022 at 1:52 AM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
>
> > In general, I agree with your comment below that we can work on this
> > after we have some more concrete plans/discussions. I think we can
> > probably consider this when we have more discussion around the
> > publication commands for the DDL objects. However, it would be good if
> > you can add some more details about the above two options.
>
> Thanks for the support. I have started a new thread on supporting replication of
> global object command and have added more detailed discussion:
> https://www.postgresql.org/message-id/CAAD30UKD7YPEbYcs_L9PYLcLZjnxyqO%3DJF5_mnAwx7g_PtOi3A%40mail.gmail.com
>
> > I noticed that LogLogicalDDLMessage() uses MyDatabaseId and then
> > decoding can take some action (filtering) based on that. Is it Okay to
> > use that function for global objects, if so, you might want to add a
> > comment for the same?
>
> Could you elaborate on the concern? The dbid filtering happens in
> logicalddlmsg_decode but I don't see why I can't use LogLogicalDDLMessage
> to log global commands.
>

I thought one may not want to tie replication of global objects with
any particular database. I read your ideas on the thread [1] about the
same but am not sure of the best way forward for it.

[1] - https://www.postgresql.org/message-id/CAAD30UKD7YPEbYcs_L9PYLcLZjnxyqO%3DJF5_mnAwx7g_PtOi3A%40mail.gmail.com

--
With Regards,
Amit Kapila.

From:	Ajin Cherian <itsajin(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-08-13 15:26:30
Message-ID:	CAFPTHDZ+rCCM7wCo_YBH4-rUZkjkQKdCwGmGWeY0REEVuieoHg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, Aug 5, 2022 at 4:03 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Hi Hou-san, here are my review comments for the patch v15-0001:
>
> ======
>
> 1. Commit Message
>
> CREATE/ALTER/DROP TABLE (*)
>
> At first, I thought "(*)" looks like a SQL syntax element.
>
> SUGGESTION:
>
> CREATE/ALTER/DROP TABLE - - Note #1, Note #2
> ...
> Note #1 – blah blah
> Note #2 – yada yada
>
> ======
fixed

>
> 2. src/backend/commands/ddl_deparse.c - General
>
> 2.1
> Lots of the deparse_XXX function are in random places scattered around
> in this module. Since there are so many, I think it's better to have
> functions arrange alphabetically to make them easier to find. (And if
> there are several functions that logically "belong" together then
> those should be re-named so they will be group together
> alphabetically...
>
> Same applies to other functions – not just the deparse_XXX ones

fixed

>
> 2.2
> There are lots of 'tmp' (or 'tmp2') variables in this file. Sometimes
> 'tmp' is appropriate (or maybe 'tmpobj' would be better) but in other
> cases it seems like there should be a better name than 'tmp'. Please
> search all these and replace where you can use a more meaningful name
> than tmp.
>

changed to tmpobj

> 2.3
> Pointer NULL comparisons are not done consistently all through the
> file. E.g. Sometimes you do tree->fmtinfo == NULL, but in others you
> do like if (!tree->fmtinfo). It's no big deal whichever way you want
> to use, but at least for the comparisons involving the same variables
> IMO should use the same style consistently.
>
> ~~~

fixed.

>
> 3. src/backend/commands/ddl_deparse.c - format_type_detailed
>
> 3.1
> + * - typename is set to the type name, without quotes
>
> But the param is called 'typname', not 'typename'
>
> 3.2
> + * - typmod is set to the typemod, if any, as a string with parens
>
> I think you mean:
> "typmod is set" -> "typemodstr is set"
>
> 3.3
> + if (IsTrueArrayType(typeform) &&
> + typeform->typstorage != TYPSTORAGE_PLAIN)
> + {
> + /* Switch our attention to the array element type */
> + ReleaseSysCache(tuple);
> + tuple = SearchSysCache1(TYPEOID, ObjectIdGetDatum(array_base_type));
> + if (!HeapTupleIsValid(tuple))
> + elog(ERROR, "cache lookup failed for type %u", type_oid);
> +
> + typeform = (Form_pg_type) GETSTRUCT(tuple);
> + type_oid = array_base_type;
> + *typarray = true;
> + }
> + else
> + *typarray = false;
>
> Maybe this if/else can be simplified
>

fixed.

> *typarray = IsTrueArrayType(typeform) && typeform->typstorage !=
> TYPSTORAGE_PLAIN;
> if (*typarray)
> {
> ...
> }
>
> 3.4
> + /* otherwise, WITH TZ is added by typmod. */
>
> Uppercase comment
>
> ~~~

fixed.

>
> 4. src/backend/commands/ddl_deparse.c - append_object_to_format_string
>
> + for (cp = sub_fmt; cp < end_ptr; cp++)
> + {
> + if (*cp == '{')
> + {
> + start_copy = true;
> + continue;
> + }
>
> What's this logic going to do if it encounters "{{" - it looks like it
> will just keep going but wouldn't that be a name error to have a "{"
> in it?

I think this logic expects single braces.

>
> ~~~
>
> 5. src/backend/commands/ddl_deparse.c - append_bool_object
>
> +append_bool_object(ObjTree *tree, char *sub_fmt, bool value)
> +{
> + ObjElem *param;
> + char *object_name = sub_fmt;
> + bool is_present_flag = false;
> +
> + Assert(sub_fmt);
> +
> + if (strcmp(sub_fmt, "present") == 0)
> + {
> + is_present_flag = true;
> + tree->present = value;
> + }
> +
> + if (!verbose && !tree->present)
> + return;
> +
> + if (!is_present_flag)
> + object_name = append_object_to_format_string(tree, sub_fmt);
> +
> + param = new_object(ObjTypeBool, object_name);
> + param->value.boolean = value;
> + append_premade_object(tree, param);
> +}
>
> It feels like there is some subtle trickery going on here with the
> conditions. Is there a simpler way to write this, or maybe it is just
> lacking some explanatory comments?
>
> ~~~

Added a comment.

>
> 6. src/backend/commands/ddl_deparse.c - append_array_object
>
> + if (!verbose)
> + {
> + ListCell *lc;
> +
> + /* Extract the ObjElems whose present flag is true */
> + foreach(lc, array)
> + {
> + ObjElem *elem = (ObjElem *) lfirst(lc);
> +
> + Assert(elem->objtype == ObjTypeObject);
> +
> + if (!elem->value.object->present)
> + array = foreach_delete_current(array, lc);
> + }
> +
> + if (list_length(array) == 0)
> + return;
> + }
>
> Maybe it is OK as-is. I'm just wondering if this list_length(array)
> check should be outside of the !verbose check?
>
> ~~~

fixed.

>
> 7. src/backend/commands/ddl_deparse.c - objtree_to_jsonb_element
>
> + case ObjTypeObject:
> + /* recursively add the object into the existing parse state */
>
> Uppercase comment
>
> ~~~
>
> 8. src/backend/commands/ddl_deparse.c - new_objtree_for_qualname_id
>
> 8.1
> + *
> + * Elements "schemaname" and "objname" are set. If the object is a temporary
> + * object, the schema name is set to "pg_temp".
>
> I'm not sure if this is the right place to say this, since it is not
> really this function that sets that "pg_temp".
>
> 8.2
> + if (isnull)
> + elog(ERROR, "unexpected NULL namespace");
> + objname = heap_getattr(catobj, Anum_name, RelationGetDescr(catalog),
> + &isnull);
>
> Missing blank line after the elog?
>
> ~~~
>
> 9. src/backend/commands/ddl_deparse.c - deparse_ColumnIdentity
>
> + /* Definition elemets */
>
> typo "elemets"
>
> ~~~
>
> 10. src/backend/commands/ddl_deparse.c - deparse_CreateTrigStmt
>
> 10.1
> + else
> + elog(ERROR, "unrecognized trigger timing value %d", node->timing);
>
> should that say "unrecognized trigger timing type" (e.g. type instead of value)
>
> 10.2
> + /*
> + * Decode the events that the trigger fires for. The output is a list;
> + * in most cases it will just be a string with the even name, but when
> + * there's an UPDATE with a list of columns, we return a JSON object.
> + */
>
> "even name" -> "event name" ?
>
> 10.3
> + foreach(cell, node->columns)
> + {
> + char *colname = strVal(lfirst(cell));
> +
> + cols = lappend(cols,
> + new_string_object(colname));
> + }
>
> Unnecessary wrapping?
>
> 10.4
> + append_array_object(update, "%{columns:, }I", cols);
> +
> + events = lappend(events,
> + new_object_object(update));
>
> Unnecessary wrapping?
>
> 10.5
> + /* verify that the argument encoding is correct */
>
> Uppercase comment
>
> ~~~

fixed all the above comments.

>
> 11. src/backend/commands/ddl_deparse.c - deparse_ColumnDef
>
> + saw_notnull = false;
> + foreach(cell, coldef->constraints)
> + {
> + Constraint *constr = (Constraint *) lfirst(cell);
> +
> + if (constr->contype == CONSTR_NOTNULL)
> + saw_notnull = true;
> + }
>
> Some similar functions here would 'break' when it finds the
> saw_notnull. Why not break here?
>
> ~~~

I think the comment explains why this "not null" is different.

>
> 12. src/backend/commands/ddl_deparse.c - obtainConstraints
>
> 12.1
> + /* only one may be valid */
>
> Uppercase comment
>
> 12.2
> + /*
> + * scan pg_constraint to fetch all constraints linked to the given
> + * relation.
> + */
>
> Uppercase comment
>
> ~~~
>
> 13. src/backend/commands/ddl_deparse.c - deparse_InhRelations
>
> +/*
> + * Deparse the inherits relations.
> + *
> + * Given a table OID, return a schema qualified table list representing
> + * the parent tables.
> + */
>
> I am not sure - should that say "Deparse the INHERITS relations." (uppercase)
>
> ~~~
>
> 14. src/backend/commands/ddl_deparse.c - pg_get_indexdef_detailed
>
> 14.1
> + * There is a huge lot of code that's a dupe of pg_get_indexdef_worker, but
> + * control flow is different enough that it doesn't seem worth keeping them
> + * together.
> + */
>
> SUGGESTION
> A large amount of code is duplicated from pg_get_indexdef_worker, ...
>
> 14.2
> + idxrelrec = (Form_pg_class) GETSTRUCT(ht_idxrel);
> +
> +
> + /*
> + * Fetch the pg_am tuple of the index' access method
> + */
>
> Remove the extra blank line
>
> ~~~
>
> 15. src/backend/commands/ddl_deparse.c - deparse_IndexStmt
>
> + /*
> + * indexes for PRIMARY KEY and other constraints are output
> + * separately; return empty here.
> + */
>
> Uppercase comment
>
> ~~~
>
> 16. src/backend/commands/ddl_deparse.c - deparse_FunctionSet
>
> 16.1
> +static ObjTree *
> +deparse_FunctionSet(VariableSetKind kind, char *name, char *value)
>
> Missing function comment.
>
> 16.2
> + ObjTree *r;
>
> Is 'r' the best name?

fixed all the above comments.

>
> 16.3
> + if (kind == VAR_RESET_ALL)
> + {
> + r = new_objtree("RESET ALL");
> + }
> + else if (value != NULL)
> + {
> + r = new_objtree_VA("SET %{set_name}I", 1,
> + "set_name", ObjTypeString, name);
> +
> + /*
> + * Some GUC variable names are 'LIST' type and hence must not be
> + * quoted.
> + */
> + if (GetConfigOptionFlags(name, true) & GUC_LIST_QUOTE)
> + append_string_object(r, "TO %{set_value}s", value);
> + else
> + append_string_object(r, "TO %{set_value}L", value);
> + }
> + else
> + {
> + r = new_objtree("RESET");
> + append_string_object(r, "%{set_name}I", name);
> + }
>
> It seems a bit strange that the kind of new_objtree is judged
> sometimes by the *value. Why isn't this just always switching on the
> different VariableSetKind?

checking on VariableSetKind requires to check multiple conditions,
this is a simpler comparison.
>
> ~~~
>
> 17. src/backend/commands/ddl_deparse.c - deparse_CreateFunction
>
> 17.1
> +/*
> + * deparse_CreateFunctionStmt
> + * Deparse a CreateFunctionStmt (CREATE FUNCTION)
> + *
> + * Given a function OID and the parsetree that created it, return the JSON
> + * blob representing the creation command.
> + */
> +static ObjTree *
> +deparse_CreateFunction(Oid objectId, Node *parsetree)
>
> The name of the function and the name of the function in the comment
> don't match. Suggest removing it from the comment.
>
> 17.2
> + /* get the pg_proc tuple */
>
> Uppercase comment
>
> 17.3
> + /* get the corresponding pg_language tuple */
>
> Uppercase comment
>
> 17.4
> + /* optional wholesale suppression of "name" occurs here */
>
> Uppercase comment
>
> 17.5
> + append_string_object(createFunc, "%{volatility}s",
> + procForm->provolatile == PROVOLATILE_VOLATILE ?
> + "VOLATILE" :
> + procForm->provolatile == PROVOLATILE_STABLE ?
> + "STABLE" :
> + procForm->provolatile == PROVOLATILE_IMMUTABLE ?
> + "IMMUTABLE" : "INVALID VOLATILITY");
>
> Does "INVALID VOLATILITY" make sense? Is that a real thing or should
> this give ERROR?

fixed the above comments, changed the logic here.

>
> 17.6
> + foreach(cell, node->options)
> + {
> + DefElem *defel = (DefElem *) lfirst(cell);
> + ObjTree *tmp = NULL;
>
> Why assign *tmp = NULL? I think it serves no purpose.
>
> ~~~
>
> 18. src/backend/commands/ddl_deparse.c - deparse_AlterFunction
>
> 18.1
> + * Deparse a AlterFunctionStmt (ALTER FUNCTION)
>
> "a" -> "an"
>
> 18.2
> + /* get the pg_proc tuple */
>
> Uppercase comment
>
> 18.3
> + alterFunc = new_objtree_VA("ALTER FUNCTION", 0);
> +
> + params = NIL;
>
> Why not just assign params = NIL at the declaration?
>
> ~~~
>
> 19. src/backend/commands/ddl_deparse.c - deparse_AlterOwnerStmt
>
> + * Deparse a AlterOwnerStmt (ALTER ... OWNER TO ...).
>
> "a" -> "an"
>
> ~~~
>
> 20. src/backend/commands/ddl_deparse.c - deparse_AlterOperatorStmt
>
> + * Deparse a AlterOperatorStmt (ALTER OPERATOR ... SET ...).
>
> "a" -> "an"
>
> ~~~
>
> 21. src/backend/commands/ddl_deparse.c - deparse_RenameStmt
>
> 21.1
> + * In a ALTER .. RENAME command, we don't have the original name of the
>
> "a" -> "an"
>
> 21.2
> + relation_close(relation, AccessShareLock);
> +
> + break;
> + case OBJECT_SCHEMA:
>
> Misplaced bank line; should be before after break;
>
> 21.3
> + break;
> + case OBJECT_TRIGGER:
>
> Put blank line after break;
>
> 21.4
> + break;
> + default:
>
> Put blank line after break;
>
> ~~~

fixed the above comments.
>
> 22. src/backend/commands/ddl_deparse.c - deparse_Seq_Cache
>
> +static inline ObjElem *
> +deparse_Seq_Cache(ObjTree *parent, Form_pg_sequence seqdata, bool alter_table)
>
> Is param 'alter_table’ correct here or should that be 'alter_sequence'
> (or just 'alter')?
>
> ~~
>
> 23. src/backend/commands/ddl_deparse.c - deparse_Seq_Cycle
>
> +static inline ObjElem *
> +deparse_Seq_Cycle(ObjTree *parent, Form_pg_sequence seqdata, bool alter_table)
>
> Is param 'alter_table’ correct here or should that be 'alter_sequence'
> (or just 'alter')?
>
> ~~~
>
> 24. src/backend/commands/ddl_deparse.c - deparse_Seq_IncrementBy
>
> +static inline ObjElem *
> +deparse_Seq_IncrementBy(ObjTree *parent, Form_pg_sequence seqdata,
> bool alter_table)
>
> Is param 'alter_table’ correct here or should that be 'alter_sequence'
> (or just 'alter')?
>
> ~~~
>
> 25. src/backend/commands/ddl_deparse.c - deparse_Seq_Minvalue
>
> +static inline ObjElem *
> +deparse_Seq_Minvalue(ObjTree *parent, Form_pg_sequence seqdata, bool
> alter_table)
>
> Is param 'alter_table’ correct here or should that be 'alter_sequence'
> (or just 'alter')?
>
> ~~~
>
> 26. src/backend/commands/ddl_deparse.c - deparse_Seq_Maxvalue
>
> +static inline ObjElem *
> +deparse_Seq_Maxvalue(ObjTree *parent, Form_pg_sequence seqdata, bool
> alter_table)
>
> Is param 'alter_table’ correct here or should that be 'alter_sequence'
> (or just 'alter')?
>
> ~~~
>
> 27. src/backend/commands/ddl_deparse.c - deparse_Seq_Startwith
>
> +static inline ObjElem *
> +deparse_Seq_Startwith(ObjTree *parent, Form_pg_sequence seqdata, bool
> alter_table)
>
> Is param 'alter_table’ correct here or should that be 'alter_sequence'
> (or just 'alter')?
>
> ~~~

This actually refers to whether the original command is from
alter_table or not.
>
> 28. src/backend/commands/ddl_deparse.c - deparse_CreateSchemaStmt
>
> +/*
> + * Deparse a CreateSchemaStmt.
> + *
> + * Given a schema OID and the parsetree that created it, return an ObjTree
> + * representing the creation command.
> + *
> + * Note we don't output the schema elements given in the creation command.
> + * They must be output separately. (In the current implementation,
> + * CreateSchemaCommand passes them back to ProcessUtility, which will lead to
> + * this file if appropriate.)
> + */
>
> "this file" ??
>
> ~~~
removed this.

>
> 29. src/backend/commands/ddl_deparse.c - deparse_Seq_OwnedBy
>
> 29.1
> +static ObjElem *
> +deparse_Seq_OwnedBy(ObjTree *parent, Oid sequenceId, bool alter_table)
>
> Missing function comment.
>
> 29.2
> + /* only consider AUTO dependencies on pg_class */
>
> Uppercase comment.
>
> 29.3
> + if (!ownedby)
> + /* XXX this shouldn't happen ... */
> + ownedby = new_objtree_VA("OWNED BY %{owner}D",
> + 3,
> + "clause", ObjTypeString, "owned",
> + "owner", ObjTypeNull,
> + "present", ObjTypeBool, false);
> + return new_object_object(ownedby);
>
> Put blank line before return;
>
> ~~~
>
> 30. src/backend/commands/ddl_deparse.c - deparse_CreateSeqStmt
>
> 30.1
> + /* definition elemets */
>
> Uppercase comment, and typo "elemets"
>
> 30.2
> + /* we purposefully do not emit OWNED BY here */
>
> Uppercase comment
>
> ~~~
>
> 31. src/backend/commands/ddl_deparse.c - deparse_AlterTableStmt
>
> 31.1
> + tmp = new_objtree_VA("ADD CONSTRAINT %{name}I",
> + 2,
> + "type", ObjTypeString, "add constraint using index",
> + "name", ObjTypeString, get_constraint_name(constrOid));
>
> I think it was customary for you to put the number of varags on the
> 1st line, not like this. There are several others like this in this
> function which should also be changed (where they fit OK on the first
> line).
>
> 31.2
> + append_array_object(alterTableStmt, "%{subcmds:, }s", subcmds);
> + return alterTableStmt;
>
> Maybe add blank line before the return;
>
> ~~~
>
> 32. src/backend/commands/ddl_deparse.c - deparse_AlterOpFamily
>
> 32.1
> + list = NIL;
> + foreach(cell, cmd->d.opfam.operators)
>
> Why not assign list = NIL with the declaration?
>
> 32.2
> + proargtypes = procForm->proargtypes.values;
> + arglist = NIL;
> + for (i = 0; i < procForm->pronargs; i++)
>
> Why not assign arglist = NIL with the declaration?
>
> 32.3
> + if (!stmt->isDrop)
> + append_format_string(alterOpFam, "ADD");
> + else
> + append_format_string(alterOpFam, "DROP");
>
> Maybe simpler to reverse these; IMO isDrop means "DROP" makes more sense.
>
> ~~~
>
> 33. src/backend/commands/ddl_deparse.c - deparse_DefineStmt_Operator
>
> + /* Add the definition clause */
> + list = NIL;
>
> Why not assign list = NIL with the declaration?
>
> ~~~

fixed all the above comments.

>
> 34. src/backend/commands/ddl_deparse.c - deparse_DefineStmt
>
> + default:
> + elog(ERROR, "unsupported object kind");
> + return NULL;
>
> What purpose does this return serve after the ERROR? If you have to do
> something to quieten the compiler, maybe it's better to set defStmt =
> NULL at declaration.
>
> ======
>
> 35. src/backend/commands/ddl_json.c - <General>
>
> In the errmsg text some of the messages say JSON (uppercase) and some
> say json (lowercase). IMO they should all be consistent. Maybe say
> JSON, since that way seems dominant.
>
> ~~~
>
> 36. src/backend/commands/ddl_json.c - enum
>
> +typedef enum
> +{
> + tv_absent,
> + tv_true,
> + tv_false
> +} trivalue;
>
> It seems there is another enum elsewhere already called this, because
> you did not add it into the typedefs.list, yet it is already there. Is
> that OK? Maybe this should have a unique name for this module.
>
> ~~~

changed this to say json_trivalue

>
> 37. src/backend/commands/ddl_json.c - expand_fmt_recursive
>
> 37.1
> + is_array = false;
> +
> + ADVANCE_PARSE_POINTER(cp, end_ptr);
>
> Why not assign is_array = false in the declaration?
>
> 37.2
> + /*
> + * Validate that we got an array if the format string specified one.
> + * And finally print out the data
> + */
> + if (is_array)
> + expand_jsonb_array(buf, param, value, arraysep, specifier, start_ptr);
> + else
> + expand_one_jsonb_element(buf, param, value, specifier, start_ptr);
>
> "Print out" the data? This comment seems a bit over-complicated.
> Perhaps these sentences can be combined and re-worded a bit.
>
> SUGGESTION (maybe?)
> Expand the data (possibly an array) into the output StringInfo.
>
> ~~~
>
> 38. src/backend/commands/ddl_json.c - expand_jsonval_identifier
>
> +/*
> + * Expand a json value as an identifier. The value must be of type string.
> + */
> +static void
> +expand_jsonval_identifier(StringInfo buf, JsonbValue *jsonval)
>
> Should that say "as a quoted identifier" ?
>
> ~~~
>
> 39. src/backend/commands/ddl_json.c - expand_jsonval_typename
>
> 39.1
> + switch (is_array)
> + {
> + default:
> + case tv_absent:
>
> It seems slightly unusual for the default case to not be the last
> switch case. Consider rearranging it.
>
>

fixed this.

> 39.2
> + if (schema == NULL)
> + appendStringInfo(buf, "%s%s%s",
> + quote_identifier(typename),
> + typmodstr ? typmodstr : "",
> + array_decor);
> +
> + /* Special typmod needs */
> + else if (schema[0] == '\0')
> + appendStringInfo(buf, "%s%s%s",
> + typename,
> + typmodstr ? typmodstr : "",
> + array_decor);
> + else
> + appendStringInfo(buf, "%s.%s%s%s",
> + quote_identifier(schema),
> + quote_identifier(typename),
> + typmodstr ? typmodstr : "",
> + array_decor);
>
> The last 2 parts:
> typmodstr ? typmodstr : "",
> array_decor);
>
> are common for all those above appendStringInfo, so you could reduce
> the code (if you want to) and just add the common parts at the end.
>
> e.g.
>
> if (schema == NULL)
> appendStringInfo(buf, "%s", quote_identifier(typename));
> else if (schema[0] == '\0')
> appendStringInfo(buf, "%s", typename); /* Special typmod needs */
> else
> appendStringInfo(buf, "%s.%s", quote_identifier(schema),
> quote_identifier(typename));
>
> appendStringInfo(buf, "%s%s", typmodstr ? typmodstr : "", array_decor);
>
>

fixed it accordingly.

> 39.3
> In other code (e.g. expand_jsonval_dottedname) you did lots of
> pfree(str) after using the strings, so why not similar here?
>
> ~~~
>
> 40. src/backend/commands/ddl_json.c - expand_jsonval_operator
>
> 40.1
> + /* schema might be NULL or empty */
>
> Uppercase comment
>
> 40.2
> Why no pfree(str) here similar to what there was in prior code (e.g.
> expand_jsonval_dottedname)?
>
> ~~~
>
> 41. src/backend/commands/ddl_json.c - expand_jsonval_string
>
> Comment says "The value must be of type string or of type object."
>
> Yeah, but what it is isn't? This code will just fall thru and return
> true. Is that the right behaviour? Should there be an Assert at least?
>
> ~~~
>
> 42. src/backend/commands/ddl_json.c - expand_jsonval_number
>
> Does this need some pfree after the string is copied to 'buf'?
>
> ~~~
>
> 43 src/backend/commands/ddl_json.c - expand_jsonval_role
>
> + rolename = find_string_in_jsonbcontainer(jsonval->val.binary.data,
> + "rolename", false, NULL);
> + appendStringInfoString(buf, quote_identifier(rolename));
>
> Does this need some pfree after the string is copied to 'buf'?
>
> ~~~
>
> 44. src/backend/commands/ddl_json.c - ddl_deparse_json_to_string
>
> + d = DirectFunctionCall1(jsonb_in,
> + PointerGetDatum(json_str));
>
> Seems unnecessary wrapping here.
>
> ~~~
>
> 45. src/backend/commands/ddl_json.c - fmtstr_error_callback
>
> 45.1
> +/*
> + * Error context callback for JSON format string expansion.
> + *
> + * Possible improvement: indicate which element we're expanding, if applicable.
> + */
>
> Should that "Possible improvement" comment have "XXX" prefix like most
> other possible improvement comments have?
>
> 45.2
> +fmtstr_error_callback(void *arg)
> +{
> + errcontext("while expanding format string \"%s\"", (char *) arg);
> +
> +}
>
> Remove the blank line.
>
> ======
>
> 46. src/backend/utils/adt/ruleutils.c - pg_get_trigger_whenclause
>
> +char *
> +pg_get_trigger_whenclause(Form_pg_trigger trigrec, Node *whenClause,
> bool pretty)
>
> Missing function comment
>
> ~~~
>
> 47. src/backend/utils/adt/ruleutils.c - print_function_sqlbody
>
> @@ -3513,7 +3526,7 @@ pg_get_function_arg_default(PG_FUNCTION_ARGS)
> PG_RETURN_TEXT_P(string_to_text(str));
> }
>
> -static void
> +void
> print_function_sqlbody(StringInfo buf, HeapTuple proctup)
> {
>
> Having a function comment is more important now that this is no longer static.
>
> ------
fixed these.

regards,
Ajin Cherian
Fujitsu Australia

Attachment	Content-Type	Size
v17-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	application/octet-stream	15.1 KB
v17-0004-Test-cases-for-DDL-replication.patch	application/octet-stream	22.5 KB
v17-0002-Support-DDL-replication.patch	application/octet-stream	124.7 KB
v17-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	196.8 KB

From:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To:	Ajin Cherian <itsajin(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>
Subject:	RE: Support logical replication of DDLs
Date:	2022-08-16 11:57:17
Message-ID:	OS0PR01MB5716052AD854FBD52316C6BE946B9@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Saturday, August 13, 2022 11:27 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
> On Fri, Aug 5, 2022 at 4:03 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
> > Hi Hou-san, here are my review comments for the patch v15-0001:
> > ...
> fixed these.

Attach the new version patch set which added support for
CRAETE/ALTER/DROP DOMAIN and
CREATE/ALTER/DROP TYPE.

Thanks to Vignesh and Ajin for helping to write
the new version patch set.

Best regards,
Hou zj

Attachment	Content-Type	Size
v18-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	223.7 KB
v18-0002-Support-DDL-replication.patch	application/octet-stream	125.0 KB
v18-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	application/octet-stream	15.1 KB
v18-0004-Test-cases-for-DDL-replication.patch	application/octet-stream	24.6 KB

From:	Ajin Cherian <itsajin(at)gmail(dot)com>
To:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-08-29 06:14:57
Message-ID:	CAFPTHDYF8RKnPT=dF5vPqVMcFyJsC_u5vE8jg=-pGR4tbpg4+w@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Tue, Aug 16, 2022 at 9:57 PM houzj(dot)fnst(at)fujitsu(dot)com
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Saturday, August 13, 2022 11:27 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
> >
> > On Fri, Aug 5, 2022 at 4:03 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > >
> > > Hi Hou-san, here are my review comments for the patch v15-0001:
> > > ...
> > fixed these.
>
> Attach the new version patch set which added support for
> CRAETE/ALTER/DROP DOMAIN and
> CREATE/ALTER/DROP TYPE.
>

Attaching a new version patchset which added support for
CREATE/ALTER/DROP CONVERSION.

regards,
Ajin Cherian
Fujitsu Australia

Attachment	Content-Type	Size
v19-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	application/octet-stream	15.0 KB
v19-0002-Support-DDL-replication.patch	application/octet-stream	124.8 KB
v19-0004-Test-cases-for-DDL-replication.patch	application/octet-stream	24.6 KB
v19-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	225.4 KB

From:	Ajin Cherian <itsajin(at)gmail(dot)com>
To:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-09-05 14:18:30
Message-ID:	CAFPTHDZY5dYZdMXMJra_uy3fU7v+FVJMCiEShLYooameZW5k=A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Mon, Aug 29, 2022 at 4:14 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
>
> Attaching a new version patchset which added support for
> CREATE/ALTER/DROP CONVERSION.

Adding support for CREATE/ALTER/DROP POLICY ddl deparsing

regards,
Ajin Cherian
Fujitsu Australia

Attachment	Content-Type	Size
v20-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	application/octet-stream	15.0 KB
v20-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	230.8 KB
v20-0004-Test-cases-for-DDL-replication.patch	application/octet-stream	24.6 KB
v20-0002-Support-DDL-replication.patch	application/octet-stream	124.9 KB

From:	Ajin Cherian <itsajin(at)gmail(dot)com>
To:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-09-15 12:38:19
Message-ID:	CAFPTHDaqqGxqncAP42Z=w9GVXDR92HN-57O=2Zy6tmayV2_eZw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Tue, Sep 6, 2022 at 12:18 AM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
> On Mon, Aug 29, 2022 at 4:14 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
> >
> >
> > Attaching a new version patchset which added support for
> > CREATE/ALTER/DROP CONVERSION.
>
> Adding support for CREATE/ALTER/DROP POLICY ddl deparsing
>

Rebasing as the patch no longer applies.

regards,
Ajin Cherian
Fujitsu Australia

Attachment	Content-Type	Size
v21-0004-Test-cases-for-DDL-replication.patch	application/octet-stream	24.6 KB
v21-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	application/octet-stream	15.0 KB
v21-0002-Support-DDL-replication.patch	application/octet-stream	124.6 KB
v21-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	230.7 KB

From:	Ajin Cherian <itsajin(at)gmail(dot)com>
To:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-09-20 14:57:44
Message-ID:	CAFPTHDaZdSuaQb2da776+jCZcWA2umGMOR3S4GD-CKw0r92v1Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Thu, Sep 15, 2022 at 10:38 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
> On Tue, Sep 6, 2022 at 12:18 AM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
> >
> > On Mon, Aug 29, 2022 at 4:14 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
> > >
> > >

Attaching support for CREATE/ALTER/DROP EXTENSION ddl deparsing and replication.
One of the issues faced during replication of "CREATE EXTENSION.." was
that it internally
results in internally generated commands to create functions,
operators etc. If these
sql commands are replicated, then the actual "create extension"
command when replicated
will error out because of duplication. To avoid this, I used the
"in_extension" variable in the
"collected command" which is set when an extension is being created.
If this variable is set, then
avoid sql commands other than "CREATE EXTENSION" from being replicated.

This patch-set also adds a 5th patch from Vignesh C for supporting the
"GRANT" command.

However the following global objects are not supported:
1) Foreign data wrapper
2) parameter - ex: grant all on parameter wal_buffers TO test;
3) tablespace - ex: grant create on tablespace tbs1 to test;
4) database - ex: grant all on database postgres to test;
5) role - ex: grant test to test1; -- no event trigger for global objects

Thanks Vignesh!

regards,
Ajin Cherian
Fujitsu Australia

Attachment	Content-Type	Size
v22-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	application/octet-stream	15.0 KB
v22-0004-Test-cases-for-DDL-replication.patch	application/octet-stream	24.6 KB
v22-0005-Support-DDL-of-GRANT-Command.patch	application/octet-stream	10.4 KB
v22-0002-Support-DDL-replication.patch	application/octet-stream	124.7 KB
v22-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	235.8 KB

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	Ajin Cherian <itsajin(at)gmail(dot)com>
Cc:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-09-21 16:55:35
Message-ID:	CAAD30U+kgmQzVKNWNvQG5b6aO-m5tW409TZGRhzj6ZYaEud+rw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Hello,

Attaching support for TEXT SEARCH commands, which includes:
-CREATE TEXT SEARCH CONFIGURATION/PARSER/DICTIONARY/TEMPLATE
-DROP TEXT SEARCH CONFIGURATION/PARSER/DICTIONARY/TEMPLATE
-ALTER TEXT SEARCH CONFIGURATION and ALTER TEXT SEARCH DICTIONARY
-ALTER TEXT SEARCH RENAME CONFIGURATION/PARSER/DICTIONARY/TEMPLATE

Regards,
Zheng

On Tue, Sep 20, 2022 at 10:57 AM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
> On Thu, Sep 15, 2022 at 10:38 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
> >
> > On Tue, Sep 6, 2022 at 12:18 AM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
> > >
> > > On Mon, Aug 29, 2022 at 4:14 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
> > > >
> > > >
>
> Attaching support for CREATE/ALTER/DROP EXTENSION ddl deparsing and replication.
> One of the issues faced during replication of "CREATE EXTENSION.." was
> that it internally
> results in internally generated commands to create functions,
> operators etc. If these
> sql commands are replicated, then the actual "create extension"
> command when replicated
> will error out because of duplication. To avoid this, I used the
> "in_extension" variable in the
> "collected command" which is set when an extension is being created.
> If this variable is set, then
> avoid sql commands other than "CREATE EXTENSION" from being replicated.
>
>
> This patch-set also adds a 5th patch from Vignesh C for supporting the
> "GRANT" command.
>
> However the following global objects are not supported:
> 1) Foreign data wrapper
> 2) parameter - ex: grant all on parameter wal_buffers TO test;
> 3) tablespace - ex: grant create on tablespace tbs1 to test;
> 4) database - ex: grant all on database postgres to test;
> 5) role - ex: grant test to test1; -- no event trigger for global objects
>
> Thanks Vignesh!
>
> regards,
> Ajin Cherian
> Fujitsu Australia

Attachment	Content-Type	Size
v23-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	application/octet-stream	15.0 KB
v23-0004-Test-cases-for-DDL-replication.patch	application/octet-stream	24.6 KB
v23-0002-Support-DDL-replication.patch	application/octet-stream	124.9 KB
v23-0005-Support-DDL-of-GRANT-Command.patch	application/octet-stream	10.4 KB
v23-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	235.9 KB
v23-0006-Add-DDL-deparser-support-for-TEXT-SEARCH-commands-wh.patch	application/octet-stream	18.4 KB

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-09-26 10:26:07
Message-ID:	CALDaNm22hc3HbB1njVwt9QPDMi3pCOcYvqRP8LYEGpNkOS=gxQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Hi,

The patch does not apply because of a recent commit. The updated patch
is rebased on top of HEAD.

Regards,
Vignesh

On Wed, 21 Sept 2022 at 22:26, Zheng Li <zhengli10(at)gmail(dot)com> wrote:
>
> Hello,
>
> Attaching support for TEXT SEARCH commands, which includes:
> -CREATE TEXT SEARCH CONFIGURATION/PARSER/DICTIONARY/TEMPLATE
> -DROP TEXT SEARCH CONFIGURATION/PARSER/DICTIONARY/TEMPLATE
> -ALTER TEXT SEARCH CONFIGURATION and ALTER TEXT SEARCH DICTIONARY
> -ALTER TEXT SEARCH RENAME CONFIGURATION/PARSER/DICTIONARY/TEMPLATE
>
> Regards,
> Zheng
>
> On Tue, Sep 20, 2022 at 10:57 AM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
> >
> > On Thu, Sep 15, 2022 at 10:38 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
> > >
> > > On Tue, Sep 6, 2022 at 12:18 AM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
> > > >
> > > > On Mon, Aug 29, 2022 at 4:14 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
> > > > >
> > > > >
> >
> > Attaching support for CREATE/ALTER/DROP EXTENSION ddl deparsing and replication.
> > One of the issues faced during replication of "CREATE EXTENSION.." was
> > that it internally
> > results in internally generated commands to create functions,
> > operators etc. If these
> > sql commands are replicated, then the actual "create extension"
> > command when replicated
> > will error out because of duplication. To avoid this, I used the
> > "in_extension" variable in the
> > "collected command" which is set when an extension is being created.
> > If this variable is set, then
> > avoid sql commands other than "CREATE EXTENSION" from being replicated.
> >
> >
> > This patch-set also adds a 5th patch from Vignesh C for supporting the
> > "GRANT" command.
> >
> > However the following global objects are not supported:
> > 1) Foreign data wrapper
> > 2) parameter - ex: grant all on parameter wal_buffers TO test;
> > 3) tablespace - ex: grant create on tablespace tbs1 to test;
> > 4) database - ex: grant all on database postgres to test;
> > 5) role - ex: grant test to test1; -- no event trigger for global objects
> >
> > Thanks Vignesh!
> >
> > regards,
> > Ajin Cherian
> > Fujitsu Australia

Attachment	Content-Type	Size
v23-0001-Functions-to-deparse-DDL-commands.patch	text/x-patch	235.9 KB
v23-0002-Support-DDL-replication.patch	text/x-patch	128.7 KB
v23-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	text/x-patch	15.0 KB
v23-0004-Test-cases-for-DDL-replication.patch	text/x-patch	24.6 KB
v23-0005-Support-DDL-of-GRANT-Command.patch	text/x-patch	10.4 KB
v23-0006-Add-DDL-deparser-support-for-TEXT-SEARCH-command.patch	text/x-patch	18.4 KB

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-09-30 07:47:13
Message-ID:	CALDaNm0VnaCg__huSDW=n=_rSGGES90cpOtqwZeWnA6muoz3oA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Mon, 26 Sept 2022 at 15:56, vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> Hi,
>
> The patch does not apply because of a recent commit. The updated patch
> is rebased on top of HEAD.
>

Adding support for ddl replication of the following:
Create Rule
Alter Rule
Drop Rule
Create Foreign Data Wrapper
Alter Foreign Data Wrapper
Drop Foreign Data Wrapper
Revoke
Refresh Materialized View

Thanks to Ajin Cherian for providing the changes for Create/Alter/Drop
Foreign Data Wrapper.
The attached v24 patch has the changes for the same.

Regards,
Vignesh

Attachment	Content-Type	Size
v24-0004-Test-cases-for-DDL-replication.patch	text/x-patch	24.6 KB
v24-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	text/x-patch	15.0 KB
v24-0005-Add-DDL-deparser-support-for-TEXT-SEARCH-command.patch	text/x-patch	18.4 KB
v24-0002-Support-DDL-replication.patch	text/x-patch	129.4 KB
v24-0001-Functions-to-deparse-DDL-commands.patch	text/x-patch	257.0 KB

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-10-06 08:30:47
Message-ID:	CAHut+Pt=n55fROZJHEp9Aa1CKvDSHTgxGk+jt4-5NjkbAYaRgA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

The patches here are quite large, so for this review post, I have only
done a quick check for cosmetic stuff in the comments of patch
v24-0001.

I did this mostly just by cutting/pasting the whole patch text into a
grammar/spell-checker to see what it reported. Please use the same
strategy prior to posting future patch versions, because it will be
way more efficient/easier for the author to spend a few minutes to fix
everything like this up-front before posting, rather than getting a
flood of review comments to deal with (like this post) about such
stuff.

(BTW, most of these suggestions are just verbatim what my
grammar/spell-checker told me)

======

1. Commit comment

(Note #2) Note that, for ATTACH/DETACH PARTITION, we haven't added
extra logic on
subscriber to handle the case where the table on publisher is a PARTITIONED
TABLE while the target table on subcriber side is NORMAL table. We will
research this more and improve this later.

SUGGESTION (minor changes + fix typo)
(Note #2) Note that for ATTACH/DETACH PARTITION we haven't added extra logic on
the subscriber to handle the case where the table on the publisher is
a PARTITIONED
TABLE while the target table on the subscriber is a NORMAL table. We will
research this more and improve it later.

======

2. GENERAL - uppercase the comments

Please make all your single-line comments start with uppercase for consistency.

Here are some examples to fix:

Line 303: + /* add the "ON table" clause */
Line 331: + /* add the USING clause, if any */
Line 349: + /* add the WITH CHECK clause, if any */
Line 653: + /* otherwise, WITH TZ is added by typmod. */
Line 663: + /* otherwise, WITH TZ is added by typmode. */
Line 1946: + /* build list of privileges to grant/revoke */
Line 2017: + /* target objects. We use object identities here */
Line 2041: + /* list of grantees */
Line 2053: + /* the wording of the grant option is variable ... */
Line 2632: + /* skip this one; we add one unconditionally below */
Line 2660: + /* done */
Line 2768: + /* add HANDLER clause */
Line 2780: + /* add VALIDATOR clause */
Line 2792: + /* add an OPTIONS clause, if any */
Line 2845: + /* add HANDLER clause */
Line 2859: + /* add VALIDATOR clause */
Line 2877: + /* add an OPTIONS clause, if any */
Line 5024: + /* add the rest of the stuff */
Line 5040: + /* add the rest of the stuff */
Line 5185: + /* a new constraint has a name and definition */
Line 5211: + /* done */
Line 6124: + /* add a CONCURRENTLY clause */
Line 6127: + /* add the matview name */
Line 6131: + /* add a WITH NO DATA clause */
Line 6302: + /* reloptions */
Line 6310: + /* tablespace */
Line 6592: + /* deconstruct the name list */
Line 6636: + /* deconstruct the name list */
Line 6725: + /* obtain object tuple */
Line 6729: + /* obtain namespace */

------

3. Grammar/Spelling

3a. - format_type_detailed

+ * - nspid is the schema OID. For certain SQL-standard types which have weird
+ * typmod rules, we return InvalidOid; caller is expected to not schema-
+ * qualify the name nor add quotes to the type name in this case.

"caller" -> "the caller"

3b. - format_type_detailed

+ * - typemodstr is set to the typemod, if any, as a string with parens

"parens" -> "parentheses"

3c. - format_type_detailed

+ else
+ /* otherwise, WITH TZ is added by typmode. */
+ *typname = pstrdup("TIME");

"typmode" -> "typmod" ?

3d. - new_objtree_for_qualname

+ * A helper routine to setup %{}D and %{}O elements.

"setup" -> "set up"

3e. - new_objtree_for_type

+/*
+ * A helper routine to setup %{}T elements.
+ */
+static ObjTree *
+new_objtree_for_type(Oid typeId, int32 typmod)

"setup" -> "set up"

3f. - pg_get_indexdef_detailed
+/*
+ * Return an index definition, split in several pieces.
+ *
+ * A large amount of code is duplicated from pg_get_indexdef_worker, but
+ * control flow is different enough that it doesn't seem worth keeping them
+ * together.
+ */
+static void
+pg_get_indexdef_detailed(Oid indexrelid,

"split in" -> "split into"

3g. - ddl_deparse_to_json

+ * The command is expanded fully, so that there are no ambiguities even in the
+ * face of search_path changes.
+ */
+Datum
+ddl_deparse_to_json(PG_FUNCTION_ARGS)

"fully, so" -> "fully so"

3h. -deparse_AlterFunction

+ * Given a function OID and the parsetree that created it, return the JSON
+ * blob representing the alter command.
+ */

"the parsetree" -> "the parse tree"

3i. - deparse_AlterObjectSchemaStmt

+/*
+ * deparse an ALTER ... SET SCHEMA command.
+ */
+static ObjTree *
+deparse_AlterObjectSchemaStmt(ObjectAddress address, Node *parsetree,

"deparse" -> "Deparse"

3j. deparse_AlterObjectSchemaStmt

+ /*
+ * Since the command has already taken place from the point of view of
+ * catalogs, getObjectIdentity returns the object name with the already
+ * changed schema. The output of our deparsing must return the original
+ * schema name however, so we chop the schema name off the identity string
+ * and then prepend the quoted schema name.

"name however," -> "name, however,"

3k. - deparse_GrantStmt

+ /* build list of privileges to grant/revoke */
+ if (istmt->all_privs)

"build list" -> "build a list"

3l. - deparse_AlterTypeSetStmt

+ * Deparse an AlterTypeStmt.
+ *
+ * Given a type OID and a parsetree that modified it, return an ObjTree
+ * representing the alter type.
+ */
+static ObjTree *
+deparse_AlterTypeSetStmt(Oid objectId, Node *cmd)

"parsetree" -> "parse tree"

3m. - deparse_CompositeTypeStmt

+ * Deparse a CompositeTypeStmt (CREATE TYPE AS)
+ *
+ * Given a type OID and the parsetree that created it, return an ObjTree
+ * representing the creation command.
+ */

"parsetree" -> "parse tree"

3n. - deparse_CreateEnumStmt

+/*
+ * Deparse a CreateEnumStmt (CREATE TYPE AS ENUM)
+ *
+ * Given a type OID and the parsetree that created it, return an ObjTree
+ * representing the creation command.
+ */

"parsetree" -> "parse tree"

3o. - deparse_CreateExtensionStmt

+/*
+ * deparse_CreateExtensionStmt
+ * deparse a CreateExtensionStmt
+ *
+ * Given an extension OID and the parsetree that created it, return the JSON
+ * blob representing the creation command.
+ *
+ */

"parsetree" -> "parse tree"

3p. - deparse_CreateFdwStmt

+/*
+ * deparse_CreateFdwStmt
+ * Deparse a CreateFdwStmt (CREATE FOREIGN DATA WRAPPER)
+ *
+ * Given a trigger OID and the parsetree that created it,
+ * return an ObjTree representing the creation command.
+ */

"parsetree" -> "parse tree"

3q. - deparse_AlterFdwStmt

+/*
+ * deparse_AlterFdwStmt
+ * Deparse an AlterFdwStmt (ALTER FOREIGN DATA WRAPPER)
+ *
+ * Given a function OID and the parsetree that create it, return the
+ * JSON blob representing the alter command.
+ */

"parsetree" -> "parse tree"

"create it" -> "created it"

3r.

+ /*
+ * Iterate through options, to see what changed, but use catalog as basis
+ * for new values.
+ */

"as basis" -> "as a basis"

3s.

+/*
+ * Deparse a CREATE TYPE AS RANGE statement
+ *
+ * Given a type OID and the parsetree that created it, return an ObjTree
+ * representing the creation command.
+ */

"parsetree" -> "parse tree"

3t. - deparse_AlterEnumStmt

+/*
+ * Deparse an AlterEnumStmt.
+ *
+ * Given a type OID and a parsetree that modified it, return an ObjTree
+ * representing the alter type.
+ */

"parsetree" -> "parse tree"

3u. - deparse_AlterTableStmt

+ /*
+ * We don't support replicating ALTER TABLE which contains volatile
+ * functions because It's possible the functions contain DDL/DML in
+ * which case these opertions will be executed twice and cause
+ * duplicate data. In addition, we don't know whether the tables being
+ * accessed by these DDL/DML are published or not. So blindly allowing
+ * such functions can allow unintended clauses like the tables accessed
+ * in those functions may not even exist on the subscriber-side.
+ */

"opertions" -> "operations"

"subscriber-side." -> "subscriber."

3v. - deparse_ColumnDef

+ * Deparse a ColumnDef node within a regular (non typed) table creation.

"non typed" -> "non-typed"

3w. - deparse_ColumnDef

+ /*
+ * Emit a NOT NULL declaration if necessary. Note that we cannot trust
+ * pg_attribute.attnotnull here, because that bit is also set when
+ * primary keys are specified; and we must not emit a NOT NULL
+ * constraint in that case, unless explicitely specified. Therefore,
+ * we scan the list of constraints attached to this column to determine
+ * whether we need to emit anything.
+ * (Fortunately, NOT NULL constraints cannot be table constraints.)
+ *
+ * In the ALTER TABLE cases, we also add a NOT NULL if the colDef is
+ * marked is_not_null.
+ */

"specified; and we" -> "specified; we"

"explicitely" -> "explicitly"

3x. - deparse_CreateDomain

+/*
+ * Deparse the CREATE DOMAIN
+ *
+ * Given a function OID and the parsetree that created it, return the JSON
+ * blob representing the creation command.
+ */

"parsetree" -> "parse tree"

3y. - deparse_CreateFunction

+/*
+ * Deparse a CreateFunctionStmt (CREATE FUNCTION)
+ *
+ * Given a function OID and the parsetree that created it, return the JSON
+ * blob representing the creation command.
+ */

"parsetree" -> "parse tree"

3z. - deparse_CreateFunction

+ /*
+ * Now iterate over each parameter in the parsetree to create the
+ * parameters array.
+ */

"parsetree" -> "parse tree"

4a. - deparse_CreateFunction

+ /*
+ * A PARAM_TABLE parameter indicates end of input arguments; the
+ * following parameters are part of the return type. We ignore them
+ * here, but keep track of the current position in the list so that
+ * we can easily produce the return type below.
+ */

"end" -> "the end"

4b. - deparse_CreateOpClassStmt

+ /* Don't deparse sql commands generated while creating extension */
+ if (cmd->in_extension)
+ return NULL;

"sql" -> "SQL"

4c. - deparse_CreateOpClassStmt

/*
+ * Add the FAMILY clause; but if it has the same name and namespace as the
+ * opclass, then have it expand to empty, because it would cause a failure
+ * if the opfamily was created internally.
+ */

"clause; " -> "clause, "

"empty," -> "empty"

4d. - deparse_CreateOpFamily

+/*
+ * Deparse a CreateTrigStmt (CREATE OPERATOR FAMILY)
+ *
+ * Given a trigger OID and the parsetree that created it, return an ObjTree
+ * representing the creation command.
+ */

"parsetree" -> "parse tree"

4e. - deparse_CreateSchemaStmt

+/*
+ * Deparse a CreateSchemaStmt.
+ *
+ * Given a schema OID and the parsetree that created it, return an ObjTree
+ * representing the creation command.
+ *
+ */

"parsetree" -> "parse tree"

4f. - deparse_CreateSeqStmt

+/*
+ * Deparse a CreateSeqStmt.
+ *
+ * Given a sequence OID and the parsetree that create it, return an ObjTree
+ * representing the creation command.
+ */

"parsetree" -> "parse tree"

"create it" -> "created it"

4g. - deparse_CreateStmt

+/*
+ * Deparse a CreateStmt (CREATE TABLE).
+ *
+ * Given a table OID and the parsetree that created it, return an ObjTree
+ * representing the creation command.
+ */

"parsetree" -> "parse tree"

4h.

+ /*
+ * Typed tables and partitions use a slightly different format string: we
+ * must not put table_elements with parents directly in the fmt string,
+ * because if there are no options the parens must not be emitted; and
+ * also, typed tables do not allow for inheritance.
+ */

"parens" -> "parentheses"

4i. - deparse_CreateStmt

+ /*
+ * We can't put table elements directly in the fmt string as an array
+ * surrounded by parens here, because an empty clause would cause a
+ * syntax error. Therefore, we use an indirection element and set
+ * present=false when there are no elements.
+ */

"parens" -> "parentheses"

4j. - deparse_CreateStmt

+ /*
+ * Get pg_class.relpartbound. We cannot use partbound in the
+ * parsetree directly as it's the original partbound expression
+ * which haven't been transformed.
+ */

"parsetree" -> "parse tree" ? maybe this one if ok if it referring to
the parameter with this name.

4k. - deparse_DefineStmt_Operator

+/*
+ * Deparse a DefineStmt (CREATE OPERATOR)
+ *
+ * Given a trigger OID and the parsetree that created it, return an ObjTree
+ * representing the creation command.
+ */

"parsetree" -> "parse tree"

4l. - deparse_CreateTrigStmt

+/*
+ * Deparse a CreateTrigStmt (CREATE TRIGGER)
+ *
+ * Given a trigger OID and the parsetree that created it, return an ObjTree
+ * representing the creation command.
+ */

"parsetree" -> "parsetree"

4m. - deparse_RefreshMatViewStmt

+/*
+ * Deparse a RefreshMatViewStmt (REFRESH MATERIALIZED VIEW)
+ *
+ * Given a materialized view OID and the parsetree that created it, return an
+ * ObjTree representing the refresh command.
+ */

"parseree" -> "parse tree"

4n. - deparse_IndexStmt

+/*
+ * Deparse an IndexStmt.
+ *
+ * Given an index OID and the parsetree that created it, return an ObjTree
+ * representing the creation command.
+ *
+ * If the index corresponds to a constraint, NULL is returned.
+ */

"parsetree" -> "parse tree"

4o. - deparse_InhRelations

+/*
+ * Deparse the INHERITS relations.
+ *
+ * Given a table OID, return a schema qualified table list representing
+ * the parent tables.
+ */
+static List *
+deparse_InhRelations(Oid objectId)

"schema qualified" -> "schema-qualified"

4p. - deparse_OnCommitClause

+/*
+ * Deparse the ON COMMMIT ... clause for CREATE ... TEMPORARY ...
+ */
+static ObjTree *
+deparse_OnCommitClause(OnCommitAction option)

"COMMMIT" -> "COMMIT"

4q. - deparse_RuleStmt

+/*
+ * Deparse a RuleStmt (CREATE RULE).
+ *
+ * Given a rule OID and the parsetree that created it, return an ObjTree
+ * representing the rule command.
+ */

"parsetree" -> "parse tree"

4r. - deparse_utility_command

+ /*
+ * Allocate everything done by the deparsing routines into a temp context,
+ * to avoid having to sprinkle them with memory handling code; but allocate
+ * the output StringInfo before switching.
+ */

"code; " -> "code, "

4s. - deparse_utility_command

+ /*
+ * Many routines underlying this one will invoke ruleutils.c functionality
+ * in order to obtain deparsed versions of expressions. In such results,
+ * we want all object names to be qualified, so that results are "portable"
+ * to environments with different search_path settings. Rather than inject
+ * what would be repetitive calls to override search path all over the
+ * place, we do it centrally here.
+ */

"in order to" => "to"

4t. - convSpecifier

+/*
+ * Conversion specifier, it determines how we expand the json element into
+ * string.
+ */

SUGGESTION
Conversion specifier which determines how we expand the json element
into a string.

4u. - json_trivalue

+/*
+ * A ternary value which represents a boolean type JsonbValue.
+ */

"which represents" -> "that represents"

4v. - expand_fmt_recursive

+ /*
+ * Scan the mandatory element name. Allow for an array separator
+ * (which may be the empty string) to be specified after colon.
+ */

"after colon" -> "after a colon"

4w. - expand_jsonval_string

+/*
+ * Expand a json value as a string. The value must be of type string or of
+ * type object. In the latter case it must contain a "fmt" element which will
+ * be recursively expanded; also, if the object contains an element "present"
+ * and it is set to false, the expansion is the empty string.
+ *
+ * Returns false if no actual expansion was made due to the "present" flag
+ * being set to "false".
+ */

"latter case" -> "latter case,"

4x. - format_procedure_args_internal

+/*
+ * Append the parenthised arguments of the given pg_proc row into the output
+ * buffer. force_qualify indicates whether to schema-qualify type names
+ * regardless of visibility.
+ */

"parenthised" -> "parenthesized "

4y.

+/*
+ * Internal version that returns definition of a CONSTRAINT command
+ */

"definition" -> "the definition"

======

5. Function comment inconsistencies

5a.

Sometimes the function name is repeated in the comment and sometimes it is not.

e.g. compare deparse_CreateEnumStmt() versus deparse_CreateExtensionStmt(), etc.

(IMO there is no need to repeat the function name)

5b.

Sometimes the deparse function comments are verbose and say like:

+ * Given a type OID and a parsetree that modified it, return an ObjTree
+ * representing the alter type.

but sometimes - like deparse_AlterExtensionStmt() etc. - they don't
bother to say anything at all.

e.g.
+/*
+ * Deparse ALTER EXTENSION .. UPDATE TO VERSION
+ */
+static ObjTree *
+deparse_AlterExtensionStmt(Oid objectId, Node *parsetree)

Either it is useful, so say it always, or it is not useful, so say it
never. Pick one.

------

6. GENERAL - json

IMO change "json" -> "JSON" everywhere.

Here are some examples:

Line 7605: + * Conversion specifier, it determines how we expand the
json element into
Line 7699: + errmsg("missing element \"%s\" in json object", keyname)));
Line 7857: + * Expand a json value as a quoted identifier. The value
must be of type string.
Line 7872: + * Expand a json value as a dot-separated-name. The value
must be of type
Line 7908: + * Expand a json value as a type name.
Line 7966: + * Expand a json value as an operator name
Line 7993: + * Expand a json value as a string. The value must be of
type string or of
Line 8031: + * Expand a json value as a string literal.
Line 8070: + * Expand a json value as an integer quantity.
Line 8083: + * Expand a json value as a role name. If the is_public
element is set to
Line 8111: + * Expand one json element into the output StringInfo
according to the
Line 8807: +{ oid => '4642', descr => 'deparse the DDL command into
json format string',
Line 8810: +{ oid => '4643', descr => 'expand json format DDL to a
plain DDL command',

------
Kind Regards,
Peter Smith.
Fujitsu Australia

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-10-06 16:40:10
Message-ID:	CAAD30UKg8rXeGM8Oy_MAmxKBL_K5DiHXdeNF=hUefcu1C_6VfQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Hi,

Attaching v25 patch which adds supports for:
CREATE TRANSFORM
DROP TRANSFORM

These are relatively new commands so they're not in the original DDL
deparser patch.
Some tweaking is made in deparse_drop_command in order to make DROP
TRANSFORM deparsing work. This is because the objidentity captured in
currentEventTriggerState->SQLDropList contains the keyword 'on', for
example "for typename on language lang", but the keyword 'on' is not
needed in the current DROP TRANSFORM syntax. So we need to remove the
'on' keyword in objidentity. I'm not sure if this is the best way to
handle it, maybe
we can consider directly modifying what's captured in
currentEventTriggerState->SQLDropList
so we don't have the "on" keyword to begin with?

BTW, my colleague Runqi started a new thread to discuss DDL deparser
testing in [1].
Your feedback is appreciated.

With Regards,
Zheng Li

[1] https://www.postgresql.org/message-id/CAH8n8_jMTunxxtP4L-3tc%3DGNamg%3Dmg1X%3DtgHr9CqqjjzFLwQng%40mail.gmail.com

Attachment	Content-Type	Size
v25-0005-Add-DDL-deparser-support-for-TEXT-SEARCH-commands-wh.patch	application/octet-stream	18.4 KB
v25-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	application/octet-stream	15.0 KB
v25-0002-Support-DDL-replication.patch	application/octet-stream	129.4 KB
v25-0004-Test-cases-for-DDL-replication.patch	application/octet-stream	24.6 KB
v25-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	257.0 KB
v25-0006-Support-CREATE-TRANSFORM-and-DROP-TRANSFORM-commands.patch	application/octet-stream	8.6 KB

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-10-06 17:16:01
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On 2022-Oct-06, Zheng Li wrote:

> Some tweaking is made in deparse_drop_command in order to make DROP
> TRANSFORM deparsing work. This is because the objidentity captured in
> currentEventTriggerState->SQLDropList contains the keyword 'on', for
> example "for typename on language lang", but the keyword 'on' is not
> needed in the current DROP TRANSFORM syntax. So we need to remove the
> 'on' keyword in objidentity. I'm not sure if this is the best way to
> handle it, maybe we can consider directly modifying what's captured in
> currentEventTriggerState->SQLDropList
> so we don't have the "on" keyword to begin with?

The exact output format for identity is not set in stone; we should only
set it in stone once we have an actual working case for them. This is
the first such use, so it seems OK to make minor modifications (such as
removing an undesirable ON) if it's a reasonable change and allows
consumer code to be more easily written.

So, +1 to dropping ON here. However, if there are further strings that
need to be modified, let's see what they are.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
<Schwern> It does it in a really, really complicated way
<crab> why does it need to be complicated?
<Schwern> Because it's MakeMaker.

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-10-06 21:30:09
Message-ID:	CAAD30UJh+LcUND+zg_A5dbQ_Bi=m_n7qUfKrOwAx2v4jBKWvKQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

> > Some tweaking is made in deparse_drop_command in order to make DROP
> > TRANSFORM deparsing work. This is because the objidentity captured in
> > currentEventTriggerState->SQLDropList contains the keyword 'on', for
> > example "for typename on language lang", but the keyword 'on' is not
> > needed in the current DROP TRANSFORM syntax. So we need to remove the
> > 'on' keyword in objidentity. I'm not sure if this is the best way to
> > handle it, maybe we can consider directly modifying what's captured in
> > currentEventTriggerState->SQLDropList
> > so we don't have the "on" keyword to begin with?
>
> The exact output format for identity is not set in stone; we should only
> set it in stone once we have an actual working case for them. This is
> the first such use, so it seems OK to make minor modifications (such as
> removing an undesirable ON) if it's a reasonable change and allows
> consumer code to be more easily written.

> So, +1 to dropping ON here. However, if there are further strings that
> need to be modified, let's see what they are.

Thanks for confirming. Attaching the new patch set that removes the
undesirable ON from getObjectIdentityParts() for TRANSFORM.

Regards,
Zheng

Attachment	Content-Type	Size
v25-0002-Support-DDL-replication.patch	application/octet-stream	129.4 KB
v25-0004-Test-cases-for-DDL-replication.patch	application/octet-stream	24.6 KB
v25-0005-Add-DDL-deparser-support-for-TEXT-SEARCH-commands-wh.patch	application/octet-stream	18.4 KB
v25-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	application/octet-stream	15.0 KB
v25-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	257.0 KB
v25-0006-Support-CREATE-TRANSFORM-and-DROP-TRANSFORM-commands.patch	application/octet-stream	9.8 KB

From:	Ajin Cherian <itsajin(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-10-07 00:01:31
Message-ID:	CAFPTHDbVujj6C3TdMCmoBXovpc4=5Ow3i5M1_HNhmnqdiA5qSA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, Oct 7, 2022 at 8:30 AM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
>
> > > Some tweaking is made in deparse_drop_command in order to make DROP
> > > TRANSFORM deparsing work. This is because the objidentity captured in
> > > currentEventTriggerState->SQLDropList contains the keyword 'on', for
> > > example "for typename on language lang", but the keyword 'on' is not
> > > needed in the current DROP TRANSFORM syntax. So we need to remove the
> > > 'on' keyword in objidentity. I'm not sure if this is the best way to
> > > handle it, maybe we can consider directly modifying what's captured in
> > > currentEventTriggerState->SQLDropList
> > > so we don't have the "on" keyword to begin with?
> >
> > The exact output format for identity is not set in stone; we should only
> > set it in stone once we have an actual working case for them. This is
> > the first such use, so it seems OK to make minor modifications (such as
> > removing an undesirable ON) if it's a reasonable change and allows
> > consumer code to be more easily written.
>
> > So, +1 to dropping ON here. However, if there are further strings that
> > need to be modified, let's see what they are.
>
> Thanks for confirming. Attaching the new patch set that removes the
> undesirable ON from getObjectIdentityParts() for TRANSFORM.
>
Thanks for the new patch-set.
Could you add the changes to patch 1 and patch 2, rather than adding a
new patch?
Otherwise, we'll have a separate patch for each command and it will
take double work to keep it updated
for each new command added.

thanks,
Ajin Cherian
Fujitsu Australia

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	Ajin Cherian <itsajin(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-10-07 02:40:25
Message-ID:	CAAD30UKCCpMTZd4WSSxYu-qRAEFd+0kjXTgje+EXHdo1JTkz0g@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

> Thanks for the new patch-set.
> Could you add the changes to patch 1 and patch 2, rather than adding a
> new patch?
> Otherwise, we'll have a separate patch for each command and it will
> take double work to keep it updated
> for each new command added.

OK. Here is the consolidated patch set. TEXT SEARCH and TRANSFORM
commands are consolidated into patch 1 and patch 2.
I was wondering if keeping the changes for each command separate makes
it easier to review. Maybe just for some commands that need special
handling.

With Regards,
Zheng

Attachment	Content-Type	Size
v25-0004-Test-cases-for-DDL-replication.patch	application/octet-stream	24.6 KB
v25-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	application/octet-stream	15.0 KB
v25-0002-Support-DDL-replication.patch	application/octet-stream	130.2 KB
v25-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	278.7 KB

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-10-07 05:07:20
Message-ID:	CALDaNm3F9GvQ9bPrxobhqkUKP3HmrRZGCU3EX0xt3=Ef0-Reaw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, 7 Oct 2022 at 08:10, Zheng Li <zhengli10(at)gmail(dot)com> wrote:
>
> > Thanks for the new patch-set.
> > Could you add the changes to patch 1 and patch 2, rather than adding a
> > new patch?
> > Otherwise, we'll have a separate patch for each command and it will
> > take double work to keep it updated
> > for each new command added.
>
> OK. Here is the consolidated patch set. TEXT SEARCH and TRANSFORM
> commands are consolidated into patch 1 and patch 2.
> I was wondering if keeping the changes for each command separate makes
> it easier to review. Maybe just for some commands that need special
> handling.

I agree, let's keep the changes for the commands in patch 1 and patch
2 if we are following the same parsing mechanism that is being
followed for the existing commands. If any command needs to be handled
in a more complex manner which is different from the existing parsing
mechanism then we can think of keeping it in a separate patch.

CFBOT at [1] was failing as some additional changes were required to
handle the meson build errors and warnings. The attached patch has the
changes for the same.
[1] - https://cirrus-ci.com/task/5343002330660864

Regards,
Vignesh

Attachment	Content-Type	Size
v25-0004-Test-cases-for-DDL-replication.patch	text/x-patch	24.6 KB
v25-0002-Support-DDL-replication.patch	text/x-patch	131.0 KB
v25-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	text/x-patch	15.0 KB
v25-0001-Functions-to-deparse-DDL-commands.patch	text/x-patch	279.0 KB

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-10-10 02:38:26
Message-ID:	CAHut+PtKiTmcQ7zXs6YvR-qtuMQ9wgffnfamqCAVpM_ETa2LCg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Hi,

Please include all known information about how this feature looks from
the user's POV.

Ideally, this information should be in the form of PGDOCS updates
included in patch 00001.

I think documenting these details should not be deferred - reviewers
will want to experiment with the feature, and also to be able to
consider about any behaviour pros/cons.

e.g. I can guess from the patch 0004 test cases maybe it currently
looks like this:

CREATE PUBLICATION mypub FOR ALL TABLES with (publish = 'insert,
update, delete, ddl')

But without documented details, that just raises more questions:

e.g.
- Does it mean the DDL replication all-or-nothing? What if I only want
to create missing tables, but not everything else?
- Is 'ddl' publish option able to be specified when using CREATE
PUBLICATION ... FOR ALL TABLES?
- What if I want to replicate DDL but not data (e.g. for making an
empty "test" database)? Will just saying publish='ddl' allow that?
- etc.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

From:	Ajin Cherian <itsajin(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-10-10 09:58:48
Message-ID:	CAFPTHDY2O42ouQFMeEbPt51CWQ=zyzYhgK6B9basyd5PLaOv0A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, Oct 7, 2022 at 4:07 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Fri, 7 Oct 2022 at 08:10, Zheng Li <zhengli10(at)gmail(dot)com> wrote:
> >
> > > Thanks for the new patch-set.
> > > Could you add the changes to patch 1 and patch 2, rather than adding a
> > > new patch?
> > > Otherwise, we'll have a separate patch for each command and it will
> > > take double work to keep it updated
> > > for each new command added.
> >
> > OK. Here is the consolidated patch set. TEXT SEARCH and TRANSFORM
> > commands are consolidated into patch 1 and patch 2.
> > I was wondering if keeping the changes for each command separate makes
> > it easier to review. Maybe just for some commands that need special
> > handling.
>
> I agree, let's keep the changes for the commands in patch 1 and patch
> 2 if we are following the same parsing mechanism that is being
> followed for the existing commands. If any command needs to be handled
> in a more complex manner which is different from the existing parsing
> mechanism then we can think of keeping it in a separate patch.
>
> CFBOT at [1] was failing as some additional changes were required to
> handle the meson build errors and warnings. The attached patch has the
> changes for the same.
> [1] - https://cirrus-ci.com/task/5343002330660864

Adding support for ddl deparsing CREATE/ALTER/DROP USER MAPPING and

changes from Vignesh C for
CREATE/ALTER/DROP SERVER
CREATE/ALTER/DROP COLLATION

I had to change the object identity of the user mapping as part of the deparsing
to make the DROP command work on user mapping.

FROM: "<role> on server <servername>"
TO : "for <role> server <servername>"

regards,
Ajin Cherian
Fujitsu Australia

Attachment	Content-Type	Size
v26-0004-Test-cases-for-DDL-replication.patch	application/octet-stream	24.6 KB
v26-0002-Support-DDL-replication.patch	application/octet-stream	131.9 KB
v26-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	application/octet-stream	15.0 KB
v26-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	292.6 KB

From:	Ajin Cherian <itsajin(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-10-11 13:30:37
Message-ID:	CAFPTHDZmS2PrOyJx8OAe+Nt-Fx4-GZetJatqCEJhDNXA2orwCg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Mon, Oct 10, 2022 at 8:58 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
> On Fri, Oct 7, 2022 at 4:07 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > On Fri, 7 Oct 2022 at 08:10, Zheng Li <zhengli10(at)gmail(dot)com> wrote:
> > >
> > > > Thanks for the new patch-set.
> > > > Could you add the changes to patch 1 and patch 2, rather than adding a
> > > > new patch?
> > > > Otherwise, we'll have a separate patch for each command and it will
> > > > take double work to keep it updated
> > > > for each new command added.
> > >
> > > OK. Here is the consolidated patch set. TEXT SEARCH and TRANSFORM
> > > commands are consolidated into patch 1 and patch 2.
> > > I was wondering if keeping the changes for each command separate makes
> > > it easier to review. Maybe just for some commands that need special
> > > handling.
> >
> > I agree, let's keep the changes for the commands in patch 1 and patch
> > 2 if we are following the same parsing mechanism that is being
> > followed for the existing commands. If any command needs to be handled
> > in a more complex manner which is different from the existing parsing
> > mechanism then we can think of keeping it in a separate patch.
> >
> > CFBOT at [1] was failing as some additional changes were required to
> > handle the meson build errors and warnings. The attached patch has the
> > changes for the same.
> > [1] - https://cirrus-ci.com/task/5343002330660864
>
> Adding support for ddl deparsing CREATE/ALTER/DROP USER MAPPING and
>
> changes from Vignesh C for
> CREATE/ALTER/DROP SERVER
> CREATE/ALTER/DROP COLLATION
>
>
> I had to change the object identity of the user mapping as part of the deparsing
> to make the DROP command work on user mapping.
>
> FROM: "<role> on server <servername>"
> TO : "for <role> server <servername>"
>
>

Rebasing the patch-set as a recent commit broke it. Also fixed some whitespaces.

regards,
Ajin Cherian
Fujitsu Australia

Attachment	Content-Type	Size
v27-0004-Test-cases-for-DDL-replication.patch	application/octet-stream	24.6 KB
v27-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	application/octet-stream	15.0 KB
v27-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	292.6 KB
v27-0002-Support-DDL-replication.patch	application/octet-stream	131.7 KB

From:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To:	Ajin Cherian <itsajin(at)gmail(dot)com>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-10-12 06:07:51
Message-ID:	CAFiTN-tWkJW-JEGANkK+O3KXUjv_Yb5Tb+r83ujbognHG_brTA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Tue, Oct 11, 2022 at 7:00 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>

I was going through 0001 patch and I have a few comments/suggestions.

@@ -6001,7 +6001,7 @@ getObjectIdentityParts(const ObjectAddress *object,
transformType = format_type_be_qualified(transform->trftype);
transformLang = get_language_name(transform->trflang, false);

- appendStringInfo(&buffer, "for %s on language %s",
+ appendStringInfo(&buffer, "for %s language %s",
transformType,
transformLang);

How is this change related to this patch?

2.
+typedef struct ObjTree
+{
+ slist_head params;
+ int numParams;
+ StringInfo fmtinfo;
+ bool present;
+} ObjTree;
+
+typedef struct ObjElem
+{
+ char *name;
+ ObjType objtype;
+
+ union
+ {
+ bool boolean;
+ char *string;
+ int64 integer;
+ float8 flt;
+ ObjTree *object;
+ List *array;
+ } value;
+ slist_node node;
+} ObjElem;

It would be good to explain these structure members from readability pov.

+
+bool verbose = true;
+

I do not understand the usage of such global variables. Even if it is
required, add some comments to explain the purpose of it.

4.
+/*
+ * Given a CollectedCommand, return a JSON representation of it.
+ *
+ * The command is expanded fully, so that there are no ambiguities even in the
+ * face of search_path changes.
+ */
+Datum
+ddl_deparse_to_json(PG_FUNCTION_ARGS)
+{

It will be nice to have a test case for this utility function.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

From:	Ajin Cherian <itsajin(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-10-13 12:46:39
Message-ID:	CAFPTHDZ1b2Vjnh0up+C1Nrcy=zrS3X=vnQSqFc8BuNqPoR6Jig@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Thu, Oct 6, 2022 at 7:31 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> The patches here are quite large, so for this review post, I have only
> done a quick check for cosmetic stuff in the comments of patch
> v24-0001.
>
> ~
>
> I did this mostly just by cutting/pasting the whole patch text into a
> grammar/spell-checker to see what it reported. Please use the same
> strategy prior to posting future patch versions, because it will be
> way more efficient/easier for the author to spend a few minutes to fix
> everything like this up-front before posting, rather than getting a
> flood of review comments to deal with (like this post) about such
> stuff.
>
> (BTW, most of these suggestions are just verbatim what my
> grammar/spell-checker told me)
>
> ======
>
> 1. Commit comment
>
> (Note #2) Note that, for ATTACH/DETACH PARTITION, we haven't added
> extra logic on
> subscriber to handle the case where the table on publisher is a PARTITIONED
> TABLE while the target table on subcriber side is NORMAL table. We will
> research this more and improve this later.
>
> SUGGESTION (minor changes + fix typo)
> (Note #2) Note that for ATTACH/DETACH PARTITION we haven't added extra logic on
> the subscriber to handle the case where the table on the publisher is
> a PARTITIONED
> TABLE while the target table on the subscriber is a NORMAL table. We will
> research this more and improve it later.
>
> ======
>
> 2. GENERAL - uppercase the comments
>
> Please make all your single-line comments start with uppercase for consistency.
>
> Here are some examples to fix:
>
> Line 303: + /* add the "ON table" clause */
> Line 331: + /* add the USING clause, if any */
> Line 349: + /* add the WITH CHECK clause, if any */
> Line 653: + /* otherwise, WITH TZ is added by typmod. */
> Line 663: + /* otherwise, WITH TZ is added by typmode. */
> Line 1946: + /* build list of privileges to grant/revoke */
> Line 2017: + /* target objects. We use object identities here */
> Line 2041: + /* list of grantees */
> Line 2053: + /* the wording of the grant option is variable ... */
> Line 2632: + /* skip this one; we add one unconditionally below */
> Line 2660: + /* done */
> Line 2768: + /* add HANDLER clause */
> Line 2780: + /* add VALIDATOR clause */
> Line 2792: + /* add an OPTIONS clause, if any */
> Line 2845: + /* add HANDLER clause */
> Line 2859: + /* add VALIDATOR clause */
> Line 2877: + /* add an OPTIONS clause, if any */
> Line 5024: + /* add the rest of the stuff */
> Line 5040: + /* add the rest of the stuff */
> Line 5185: + /* a new constraint has a name and definition */
> Line 5211: + /* done */
> Line 6124: + /* add a CONCURRENTLY clause */
> Line 6127: + /* add the matview name */
> Line 6131: + /* add a WITH NO DATA clause */
> Line 6302: + /* reloptions */
> Line 6310: + /* tablespace */
> Line 6592: + /* deconstruct the name list */
> Line 6636: + /* deconstruct the name list */
> Line 6725: + /* obtain object tuple */
> Line 6729: + /* obtain namespace */
>
> ------
>
> 3. Grammar/Spelling
>
> 3a. - format_type_detailed
>
> + * - nspid is the schema OID. For certain SQL-standard types which have weird
> + * typmod rules, we return InvalidOid; caller is expected to not schema-
> + * qualify the name nor add quotes to the type name in this case.
>
> "caller" -> "the caller"
>
> ~
>
> 3b. - format_type_detailed
>
> + * - typemodstr is set to the typemod, if any, as a string with parens
>
> "parens" -> "parentheses"
>
> ~
>
> 3c. - format_type_detailed
>
> + else
> + /* otherwise, WITH TZ is added by typmode. */
> + *typname = pstrdup("TIME");
>
> "typmode" -> "typmod" ?
>
> ~
>
> 3d. - new_objtree_for_qualname
>
> + * A helper routine to setup %{}D and %{}O elements.
>
> "setup" -> "set up"
>
> ~
>
> 3e. - new_objtree_for_type
>
> +/*
> + * A helper routine to setup %{}T elements.
> + */
> +static ObjTree *
> +new_objtree_for_type(Oid typeId, int32 typmod)
>
> "setup" -> "set up"
>
> ~
>
> 3f. - pg_get_indexdef_detailed
> +/*
> + * Return an index definition, split in several pieces.
> + *
> + * A large amount of code is duplicated from pg_get_indexdef_worker, but
> + * control flow is different enough that it doesn't seem worth keeping them
> + * together.
> + */
> +static void
> +pg_get_indexdef_detailed(Oid indexrelid,
>
> "split in" -> "split into"
>
> ~
>
> 3g. - ddl_deparse_to_json
>
> + * The command is expanded fully, so that there are no ambiguities even in the
> + * face of search_path changes.
> + */
> +Datum
> +ddl_deparse_to_json(PG_FUNCTION_ARGS)
>
> "fully, so" -> "fully so"
>
> ~
>
> 3h. -deparse_AlterFunction
>
> + * Given a function OID and the parsetree that created it, return the JSON
> + * blob representing the alter command.
> + */
>
> "the parsetree" -> "the parse tree"
>
> ~
>
> 3i. - deparse_AlterObjectSchemaStmt
>
> +/*
> + * deparse an ALTER ... SET SCHEMA command.
> + */
> +static ObjTree *
> +deparse_AlterObjectSchemaStmt(ObjectAddress address, Node *parsetree,
>
> "deparse" -> "Deparse"
>
> ~
>
> 3j. deparse_AlterObjectSchemaStmt
>
> + /*
> + * Since the command has already taken place from the point of view of
> + * catalogs, getObjectIdentity returns the object name with the already
> + * changed schema. The output of our deparsing must return the original
> + * schema name however, so we chop the schema name off the identity string
> + * and then prepend the quoted schema name.
>
> "name however," -> "name, however,"
>
> ~
>
> 3k. - deparse_GrantStmt
>
> + /* build list of privileges to grant/revoke */
> + if (istmt->all_privs)
>
> "build list" -> "build a list"
>
> ~
>
> 3l. - deparse_AlterTypeSetStmt
>
> + * Deparse an AlterTypeStmt.
> + *
> + * Given a type OID and a parsetree that modified it, return an ObjTree
> + * representing the alter type.
> + */
> +static ObjTree *
> +deparse_AlterTypeSetStmt(Oid objectId, Node *cmd)
>
> "parsetree" -> "parse tree"
>
> ~
>
> 3m. - deparse_CompositeTypeStmt
>
> + * Deparse a CompositeTypeStmt (CREATE TYPE AS)
> + *
> + * Given a type OID and the parsetree that created it, return an ObjTree
> + * representing the creation command.
> + */
>
> "parsetree" -> "parse tree"
>
> ~
>
> 3n. - deparse_CreateEnumStmt
>
> +/*
> + * Deparse a CreateEnumStmt (CREATE TYPE AS ENUM)
> + *
> + * Given a type OID and the parsetree that created it, return an ObjTree
> + * representing the creation command.
> + */
>
> "parsetree" -> "parse tree"
>
> ~
>
> 3o. - deparse_CreateExtensionStmt
>
> +/*
> + * deparse_CreateExtensionStmt
> + * deparse a CreateExtensionStmt
> + *
> + * Given an extension OID and the parsetree that created it, return the JSON
> + * blob representing the creation command.
> + *
> + */
>
> "parsetree" -> "parse tree"
>
> ~
>
> 3p. - deparse_CreateFdwStmt
>
> +/*
> + * deparse_CreateFdwStmt
> + * Deparse a CreateFdwStmt (CREATE FOREIGN DATA WRAPPER)
> + *
> + * Given a trigger OID and the parsetree that created it,
> + * return an ObjTree representing the creation command.
> + */
>
> "parsetree" -> "parse tree"
>
> ~
>
> 3q. - deparse_AlterFdwStmt
>
> +/*
> + * deparse_AlterFdwStmt
> + * Deparse an AlterFdwStmt (ALTER FOREIGN DATA WRAPPER)
> + *
> + * Given a function OID and the parsetree that create it, return the
> + * JSON blob representing the alter command.
> + */
>
> "parsetree" -> "parse tree"
>
> "create it" -> "created it"
>
> 3r.
>
> + /*
> + * Iterate through options, to see what changed, but use catalog as basis
> + * for new values.
> + */
>
> "as basis" -> "as a basis"
>
> ~
>
> 3s.
>
> +/*
> + * Deparse a CREATE TYPE AS RANGE statement
> + *
> + * Given a type OID and the parsetree that created it, return an ObjTree
> + * representing the creation command.
> + */
>
> "parsetree" -> "parse tree"
>
> ~
>
> 3t. - deparse_AlterEnumStmt
>
> +/*
> + * Deparse an AlterEnumStmt.
> + *
> + * Given a type OID and a parsetree that modified it, return an ObjTree
> + * representing the alter type.
> + */
>
> "parsetree" -> "parse tree"
>
> ~
>
> 3u. - deparse_AlterTableStmt
>
> + /*
> + * We don't support replicating ALTER TABLE which contains volatile
> + * functions because It's possible the functions contain DDL/DML in
> + * which case these opertions will be executed twice and cause
> + * duplicate data. In addition, we don't know whether the tables being
> + * accessed by these DDL/DML are published or not. So blindly allowing
> + * such functions can allow unintended clauses like the tables accessed
> + * in those functions may not even exist on the subscriber-side.
> + */
>
> "opertions" -> "operations"
>
> "subscriber-side." -> "subscriber."
>
> ~
>
> 3v. - deparse_ColumnDef
>
> + * Deparse a ColumnDef node within a regular (non typed) table creation.
>
> "non typed" -> "non-typed"
>
> ~
>
> 3w. - deparse_ColumnDef
>
> + /*
> + * Emit a NOT NULL declaration if necessary. Note that we cannot trust
> + * pg_attribute.attnotnull here, because that bit is also set when
> + * primary keys are specified; and we must not emit a NOT NULL
> + * constraint in that case, unless explicitely specified. Therefore,
> + * we scan the list of constraints attached to this column to determine
> + * whether we need to emit anything.
> + * (Fortunately, NOT NULL constraints cannot be table constraints.)
> + *
> + * In the ALTER TABLE cases, we also add a NOT NULL if the colDef is
> + * marked is_not_null.
> + */
>
> "specified; and we" -> "specified; we"
>
> "explicitely" -> "explicitly"
>
>
> ~
>
> 3x. - deparse_CreateDomain
>
> +/*
> + * Deparse the CREATE DOMAIN
> + *
> + * Given a function OID and the parsetree that created it, return the JSON
> + * blob representing the creation command.
> + */
>
> "parsetree" -> "parse tree"
>
> ~
>
> 3y. - deparse_CreateFunction
>
> +/*
> + * Deparse a CreateFunctionStmt (CREATE FUNCTION)
> + *
> + * Given a function OID and the parsetree that created it, return the JSON
> + * blob representing the creation command.
> + */
>
> "parsetree" -> "parse tree"
>
> ~
>
> 3z. - deparse_CreateFunction
>
> + /*
> + * Now iterate over each parameter in the parsetree to create the
> + * parameters array.
> + */
>
> "parsetree" -> "parse tree"
>
> ~
>
> 4a. - deparse_CreateFunction
>
> + /*
> + * A PARAM_TABLE parameter indicates end of input arguments; the
> + * following parameters are part of the return type. We ignore them
> + * here, but keep track of the current position in the list so that
> + * we can easily produce the return type below.
> + */
>
> "end" -> "the end"
>
> ~
>
> 4b. - deparse_CreateOpClassStmt
>
> + /* Don't deparse sql commands generated while creating extension */
> + if (cmd->in_extension)
> + return NULL;
>
> "sql" -> "SQL"
>
> ~
>
> 4c. - deparse_CreateOpClassStmt
>
> /*
> + * Add the FAMILY clause; but if it has the same name and namespace as the
> + * opclass, then have it expand to empty, because it would cause a failure
> + * if the opfamily was created internally.
> + */
>
> "clause; " -> "clause, "
>
> "empty," -> "empty"
>
> ~
>
> 4d. - deparse_CreateOpFamily
>
> +/*
> + * Deparse a CreateTrigStmt (CREATE OPERATOR FAMILY)
> + *
> + * Given a trigger OID and the parsetree that created it, return an ObjTree
> + * representing the creation command.
> + */
>
> "parsetree" -> "parse tree"
>
> ~
>
> 4e. - deparse_CreateSchemaStmt
>
> +/*
> + * Deparse a CreateSchemaStmt.
> + *
> + * Given a schema OID and the parsetree that created it, return an ObjTree
> + * representing the creation command.
> + *
> + */
>
> "parsetree" -> "parse tree"
>
> ~
>
> 4f. - deparse_CreateSeqStmt
>
> +/*
> + * Deparse a CreateSeqStmt.
> + *
> + * Given a sequence OID and the parsetree that create it, return an ObjTree
> + * representing the creation command.
> + */
>
> "parsetree" -> "parse tree"
>
> "create it" -> "created it"
>
> ~
>
> 4g. - deparse_CreateStmt
>
> +/*
> + * Deparse a CreateStmt (CREATE TABLE).
> + *
> + * Given a table OID and the parsetree that created it, return an ObjTree
> + * representing the creation command.
> + */
>
> "parsetree" -> "parse tree"
>
> ~
>
> 4h.
>
> + /*
> + * Typed tables and partitions use a slightly different format string: we
> + * must not put table_elements with parents directly in the fmt string,
> + * because if there are no options the parens must not be emitted; and
> + * also, typed tables do not allow for inheritance.
> + */
>
> "parens" -> "parentheses"
>
> ~
>
> 4i. - deparse_CreateStmt
>
> + /*
> + * We can't put table elements directly in the fmt string as an array
> + * surrounded by parens here, because an empty clause would cause a
> + * syntax error. Therefore, we use an indirection element and set
> + * present=false when there are no elements.
> + */
>
> "parens" -> "parentheses"
>
> ~
>
> 4j. - deparse_CreateStmt
>
> + /*
> + * Get pg_class.relpartbound. We cannot use partbound in the
> + * parsetree directly as it's the original partbound expression
> + * which haven't been transformed.
> + */
>
> "parsetree" -> "parse tree" ? maybe this one if ok if it referring to
> the parameter with this name.
>
> ~
>
> 4k. - deparse_DefineStmt_Operator
>
> +/*
> + * Deparse a DefineStmt (CREATE OPERATOR)
> + *
> + * Given a trigger OID and the parsetree that created it, return an ObjTree
> + * representing the creation command.
> + */
>
> "parsetree" -> "parse tree"
>
> ~
>
> 4l. - deparse_CreateTrigStmt
>
> +/*
> + * Deparse a CreateTrigStmt (CREATE TRIGGER)
> + *
> + * Given a trigger OID and the parsetree that created it, return an ObjTree
> + * representing the creation command.
> + */
>
> "parsetree" -> "parsetree"
>
> ~
>
> 4m. - deparse_RefreshMatViewStmt
>
> +/*
> + * Deparse a RefreshMatViewStmt (REFRESH MATERIALIZED VIEW)
> + *
> + * Given a materialized view OID and the parsetree that created it, return an
> + * ObjTree representing the refresh command.
> + */
>
> "parseree" -> "parse tree"
>
> ~
>
> 4n. - deparse_IndexStmt
>
> +/*
> + * Deparse an IndexStmt.
> + *
> + * Given an index OID and the parsetree that created it, return an ObjTree
> + * representing the creation command.
> + *
> + * If the index corresponds to a constraint, NULL is returned.
> + */
>
> "parsetree" -> "parse tree"
>
> ~
>
> 4o. - deparse_InhRelations
>
> +/*
> + * Deparse the INHERITS relations.
> + *
> + * Given a table OID, return a schema qualified table list representing
> + * the parent tables.
> + */
> +static List *
> +deparse_InhRelations(Oid objectId)
>
> "schema qualified" -> "schema-qualified"
>
> ~
>
> 4p. - deparse_OnCommitClause
>
> +/*
> + * Deparse the ON COMMMIT ... clause for CREATE ... TEMPORARY ...
> + */
> +static ObjTree *
> +deparse_OnCommitClause(OnCommitAction option)
>
> "COMMMIT" -> "COMMIT"
>
> ~
>
> 4q. - deparse_RuleStmt
>
> +/*
> + * Deparse a RuleStmt (CREATE RULE).
> + *
> + * Given a rule OID and the parsetree that created it, return an ObjTree
> + * representing the rule command.
> + */
>
> "parsetree" -> "parse tree"
>
> ~
>
> 4r. - deparse_utility_command
>
> + /*
> + * Allocate everything done by the deparsing routines into a temp context,
> + * to avoid having to sprinkle them with memory handling code; but allocate
> + * the output StringInfo before switching.
> + */
>
> "code; " -> "code, "
>
> ~
>
> 4s. - deparse_utility_command
>
> + /*
> + * Many routines underlying this one will invoke ruleutils.c functionality
> + * in order to obtain deparsed versions of expressions. In such results,
> + * we want all object names to be qualified, so that results are "portable"
> + * to environments with different search_path settings. Rather than inject
> + * what would be repetitive calls to override search path all over the
> + * place, we do it centrally here.
> + */
>
> "in order to" => "to"
>
> ~
>
> 4t. - convSpecifier
>
> +/*
> + * Conversion specifier, it determines how we expand the json element into
> + * string.
> + */
>
> SUGGESTION
> Conversion specifier which determines how we expand the json element
> into a string.
>
> ~
>
> 4u. - json_trivalue
>
> +/*
> + * A ternary value which represents a boolean type JsonbValue.
> + */
>
> "which represents" -> "that represents"
>
> ~
>
> 4v. - expand_fmt_recursive
>
> + /*
> + * Scan the mandatory element name. Allow for an array separator
> + * (which may be the empty string) to be specified after colon.
> + */
>
> "after colon" -> "after a colon"
>
> ~
>
> 4w. - expand_jsonval_string
>
> +/*
> + * Expand a json value as a string. The value must be of type string or of
> + * type object. In the latter case it must contain a "fmt" element which will
> + * be recursively expanded; also, if the object contains an element "present"
> + * and it is set to false, the expansion is the empty string.
> + *
> + * Returns false if no actual expansion was made due to the "present" flag
> + * being set to "false".
> + */
>
> "latter case" -> "latter case,"
>
> ~
>
> 4x. - format_procedure_args_internal
>
> +/*
> + * Append the parenthised arguments of the given pg_proc row into the output
> + * buffer. force_qualify indicates whether to schema-qualify type names
> + * regardless of visibility.
> + */
>
> "parenthised" -> "parenthesized "
>
> ~
>
> 4y.
>
> +/*
> + * Internal version that returns definition of a CONSTRAINT command
> + */
>
> "definition" -> "the definition"
>
> ======
>
> 5. Function comment inconsistencies
>
> 5a.
>
> Sometimes the function name is repeated in the comment and sometimes it is not.
>
> e.g. compare deparse_CreateEnumStmt() versus deparse_CreateExtensionStmt(), etc.
>
> (IMO there is no need to repeat the function name)
>
> ~
>
> 5b.
>
> Sometimes the deparse function comments are verbose and say like:
>
> + * Given a type OID and a parsetree that modified it, return an ObjTree
> + * representing the alter type.
>
> but sometimes - like deparse_AlterExtensionStmt() etc. - they don't
> bother to say anything at all.
>
> e.g.
> +/*
> + * Deparse ALTER EXTENSION .. UPDATE TO VERSION
> + */
> +static ObjTree *
> +deparse_AlterExtensionStmt(Oid objectId, Node *parsetree)
>
> Either it is useful, so say it always, or it is not useful, so say it
> never. Pick one.
>
> ------
>
> 6. GENERAL - json
>
> IMO change "json" -> "JSON" everywhere.
>
> Here are some examples:
>
> Line 7605: + * Conversion specifier, it determines how we expand the
> json element into
> Line 7699: + errmsg("missing element \"%s\" in json object", keyname)));
> Line 7857: + * Expand a json value as a quoted identifier. The value
> must be of type string.
> Line 7872: + * Expand a json value as a dot-separated-name. The value
> must be of type
> Line 7908: + * Expand a json value as a type name.
> Line 7966: + * Expand a json value as an operator name
> Line 7993: + * Expand a json value as a string. The value must be of
> type string or of
> Line 8031: + * Expand a json value as a string literal.
> Line 8070: + * Expand a json value as an integer quantity.
> Line 8083: + * Expand a json value as a role name. If the is_public
> element is set to
> Line 8111: + * Expand one json element into the output StringInfo
> according to the
> Line 8807: +{ oid => '4642', descr => 'deparse the DDL command into
> json format string',
> Line 8810: +{ oid => '4643', descr => 'expand json format DDL to a
> plain DDL command',
>
> ------

I've addressed all the above comments.

regards,
Ajin Cherian
Fujitsu Australia

Attachment	Content-Type	Size
v28-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	application/octet-stream	15.0 KB
v28-0002-Support-DDL-replication.patch	application/octet-stream	131.7 KB
v28-0004-Test-cases-for-DDL-replication.patch	application/octet-stream	24.6 KB
v28-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	293.4 KB

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Ajin Cherian <itsajin(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-10-17 06:11:06
Message-ID:	CALDaNm3+EmzCD6+0y7wTes9syHOdTgFchVrOV_X71UQQRv7EUA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Thu, 13 Oct 2022 at 18:16, Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
> On Thu, Oct 6, 2022 at 7:31 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
> > The patches here are quite large, so for this review post, I have only
> > done a quick check for cosmetic stuff in the comments of patch
> > v24-0001.
> >
> > ~
> >
> > I did this mostly just by cutting/pasting the whole patch text into a
> > grammar/spell-checker to see what it reported. Please use the same
> > strategy prior to posting future patch versions, because it will be
> > way more efficient/easier for the author to spend a few minutes to fix
> > everything like this up-front before posting, rather than getting a
> > flood of review comments to deal with (like this post) about such
> > stuff.
> >
> > (BTW, most of these suggestions are just verbatim what my
> > grammar/spell-checker told me)
> >
> > ======
> >
> > 1. Commit comment
> >
> > (Note #2) Note that, for ATTACH/DETACH PARTITION, we haven't added
> > extra logic on
> > subscriber to handle the case where the table on publisher is a PARTITIONED
> > TABLE while the target table on subcriber side is NORMAL table. We will
> > research this more and improve this later.
> >
> > SUGGESTION (minor changes + fix typo)
> > (Note #2) Note that for ATTACH/DETACH PARTITION we haven't added extra logic on
> > the subscriber to handle the case where the table on the publisher is
> > a PARTITIONED
> > TABLE while the target table on the subscriber is a NORMAL table. We will
> > research this more and improve it later.
> >
> > ======
> >
> > 2. GENERAL - uppercase the comments
> >
> > Please make all your single-line comments start with uppercase for consistency.
> >
> > Here are some examples to fix:
> >
> > Line 303: + /* add the "ON table" clause */
> > Line 331: + /* add the USING clause, if any */
> > Line 349: + /* add the WITH CHECK clause, if any */
> > Line 653: + /* otherwise, WITH TZ is added by typmod. */
> > Line 663: + /* otherwise, WITH TZ is added by typmode. */
> > Line 1946: + /* build list of privileges to grant/revoke */
> > Line 2017: + /* target objects. We use object identities here */
> > Line 2041: + /* list of grantees */
> > Line 2053: + /* the wording of the grant option is variable ... */
> > Line 2632: + /* skip this one; we add one unconditionally below */
> > Line 2660: + /* done */
> > Line 2768: + /* add HANDLER clause */
> > Line 2780: + /* add VALIDATOR clause */
> > Line 2792: + /* add an OPTIONS clause, if any */
> > Line 2845: + /* add HANDLER clause */
> > Line 2859: + /* add VALIDATOR clause */
> > Line 2877: + /* add an OPTIONS clause, if any */
> > Line 5024: + /* add the rest of the stuff */
> > Line 5040: + /* add the rest of the stuff */
> > Line 5185: + /* a new constraint has a name and definition */
> > Line 5211: + /* done */
> > Line 6124: + /* add a CONCURRENTLY clause */
> > Line 6127: + /* add the matview name */
> > Line 6131: + /* add a WITH NO DATA clause */
> > Line 6302: + /* reloptions */
> > Line 6310: + /* tablespace */
> > Line 6592: + /* deconstruct the name list */
> > Line 6636: + /* deconstruct the name list */
> > Line 6725: + /* obtain object tuple */
> > Line 6729: + /* obtain namespace */
> >
> > ------
> >
> > 3. Grammar/Spelling
> >
> > 3a. - format_type_detailed
> >
> > + * - nspid is the schema OID. For certain SQL-standard types which have weird
> > + * typmod rules, we return InvalidOid; caller is expected to not schema-
> > + * qualify the name nor add quotes to the type name in this case.
> >
> > "caller" -> "the caller"
> >
> > ~
> >
> > 3b. - format_type_detailed
> >
> > + * - typemodstr is set to the typemod, if any, as a string with parens
> >
> > "parens" -> "parentheses"
> >
> > ~
> >
> > 3c. - format_type_detailed
> >
> > + else
> > + /* otherwise, WITH TZ is added by typmode. */
> > + *typname = pstrdup("TIME");
> >
> > "typmode" -> "typmod" ?
> >
> > ~
> >
> > 3d. - new_objtree_for_qualname
> >
> > + * A helper routine to setup %{}D and %{}O elements.
> >
> > "setup" -> "set up"
> >
> > ~
> >
> > 3e. - new_objtree_for_type
> >
> > +/*
> > + * A helper routine to setup %{}T elements.
> > + */
> > +static ObjTree *
> > +new_objtree_for_type(Oid typeId, int32 typmod)
> >
> > "setup" -> "set up"
> >
> > ~
> >
> > 3f. - pg_get_indexdef_detailed
> > +/*
> > + * Return an index definition, split in several pieces.
> > + *
> > + * A large amount of code is duplicated from pg_get_indexdef_worker, but
> > + * control flow is different enough that it doesn't seem worth keeping them
> > + * together.
> > + */
> > +static void
> > +pg_get_indexdef_detailed(Oid indexrelid,
> >
> > "split in" -> "split into"
> >
> > ~
> >
> > 3g. - ddl_deparse_to_json
> >
> > + * The command is expanded fully, so that there are no ambiguities even in the
> > + * face of search_path changes.
> > + */
> > +Datum
> > +ddl_deparse_to_json(PG_FUNCTION_ARGS)
> >
> > "fully, so" -> "fully so"
> >
> > ~
> >
> > 3h. -deparse_AlterFunction
> >
> > + * Given a function OID and the parsetree that created it, return the JSON
> > + * blob representing the alter command.
> > + */
> >
> > "the parsetree" -> "the parse tree"
> >
> > ~
> >
> > 3i. - deparse_AlterObjectSchemaStmt
> >
> > +/*
> > + * deparse an ALTER ... SET SCHEMA command.
> > + */
> > +static ObjTree *
> > +deparse_AlterObjectSchemaStmt(ObjectAddress address, Node *parsetree,
> >
> > "deparse" -> "Deparse"
> >
> > ~
> >
> > 3j. deparse_AlterObjectSchemaStmt
> >
> > + /*
> > + * Since the command has already taken place from the point of view of
> > + * catalogs, getObjectIdentity returns the object name with the already
> > + * changed schema. The output of our deparsing must return the original
> > + * schema name however, so we chop the schema name off the identity string
> > + * and then prepend the quoted schema name.
> >
> > "name however," -> "name, however,"
> >
> > ~
> >
> > 3k. - deparse_GrantStmt
> >
> > + /* build list of privileges to grant/revoke */
> > + if (istmt->all_privs)
> >
> > "build list" -> "build a list"
> >
> > ~
> >
> > 3l. - deparse_AlterTypeSetStmt
> >
> > + * Deparse an AlterTypeStmt.
> > + *
> > + * Given a type OID and a parsetree that modified it, return an ObjTree
> > + * representing the alter type.
> > + */
> > +static ObjTree *
> > +deparse_AlterTypeSetStmt(Oid objectId, Node *cmd)
> >
> > "parsetree" -> "parse tree"
> >
> > ~
> >
> > 3m. - deparse_CompositeTypeStmt
> >
> > + * Deparse a CompositeTypeStmt (CREATE TYPE AS)
> > + *
> > + * Given a type OID and the parsetree that created it, return an ObjTree
> > + * representing the creation command.
> > + */
> >
> > "parsetree" -> "parse tree"
> >
> > ~
> >
> > 3n. - deparse_CreateEnumStmt
> >
> > +/*
> > + * Deparse a CreateEnumStmt (CREATE TYPE AS ENUM)
> > + *
> > + * Given a type OID and the parsetree that created it, return an ObjTree
> > + * representing the creation command.
> > + */
> >
> > "parsetree" -> "parse tree"
> >
> > ~
> >
> > 3o. - deparse_CreateExtensionStmt
> >
> > +/*
> > + * deparse_CreateExtensionStmt
> > + * deparse a CreateExtensionStmt
> > + *
> > + * Given an extension OID and the parsetree that created it, return the JSON
> > + * blob representing the creation command.
> > + *
> > + */
> >
> > "parsetree" -> "parse tree"
> >
> > ~
> >
> > 3p. - deparse_CreateFdwStmt
> >
> > +/*
> > + * deparse_CreateFdwStmt
> > + * Deparse a CreateFdwStmt (CREATE FOREIGN DATA WRAPPER)
> > + *
> > + * Given a trigger OID and the parsetree that created it,
> > + * return an ObjTree representing the creation command.
> > + */
> >
> > "parsetree" -> "parse tree"
> >
> > ~
> >
> > 3q. - deparse_AlterFdwStmt
> >
> > +/*
> > + * deparse_AlterFdwStmt
> > + * Deparse an AlterFdwStmt (ALTER FOREIGN DATA WRAPPER)
> > + *
> > + * Given a function OID and the parsetree that create it, return the
> > + * JSON blob representing the alter command.
> > + */
> >
> > "parsetree" -> "parse tree"
> >
> > "create it" -> "created it"
> >
> > 3r.
> >
> > + /*
> > + * Iterate through options, to see what changed, but use catalog as basis
> > + * for new values.
> > + */
> >
> > "as basis" -> "as a basis"
> >
> > ~
> >
> > 3s.
> >
> > +/*
> > + * Deparse a CREATE TYPE AS RANGE statement
> > + *
> > + * Given a type OID and the parsetree that created it, return an ObjTree
> > + * representing the creation command.
> > + */
> >
> > "parsetree" -> "parse tree"
> >
> > ~
> >
> > 3t. - deparse_AlterEnumStmt
> >
> > +/*
> > + * Deparse an AlterEnumStmt.
> > + *
> > + * Given a type OID and a parsetree that modified it, return an ObjTree
> > + * representing the alter type.
> > + */
> >
> > "parsetree" -> "parse tree"
> >
> > ~
> >
> > 3u. - deparse_AlterTableStmt
> >
> > + /*
> > + * We don't support replicating ALTER TABLE which contains volatile
> > + * functions because It's possible the functions contain DDL/DML in
> > + * which case these opertions will be executed twice and cause
> > + * duplicate data. In addition, we don't know whether the tables being
> > + * accessed by these DDL/DML are published or not. So blindly allowing
> > + * such functions can allow unintended clauses like the tables accessed
> > + * in those functions may not even exist on the subscriber-side.
> > + */
> >
> > "opertions" -> "operations"
> >
> > "subscriber-side." -> "subscriber."
> >
> > ~
> >
> > 3v. - deparse_ColumnDef
> >
> > + * Deparse a ColumnDef node within a regular (non typed) table creation.
> >
> > "non typed" -> "non-typed"
> >
> > ~
> >
> > 3w. - deparse_ColumnDef
> >
> > + /*
> > + * Emit a NOT NULL declaration if necessary. Note that we cannot trust
> > + * pg_attribute.attnotnull here, because that bit is also set when
> > + * primary keys are specified; and we must not emit a NOT NULL
> > + * constraint in that case, unless explicitely specified. Therefore,
> > + * we scan the list of constraints attached to this column to determine
> > + * whether we need to emit anything.
> > + * (Fortunately, NOT NULL constraints cannot be table constraints.)
> > + *
> > + * In the ALTER TABLE cases, we also add a NOT NULL if the colDef is
> > + * marked is_not_null.
> > + */
> >
> > "specified; and we" -> "specified; we"
> >
> > "explicitely" -> "explicitly"
> >
> >
> > ~
> >
> > 3x. - deparse_CreateDomain
> >
> > +/*
> > + * Deparse the CREATE DOMAIN
> > + *
> > + * Given a function OID and the parsetree that created it, return the JSON
> > + * blob representing the creation command.
> > + */
> >
> > "parsetree" -> "parse tree"
> >
> > ~
> >
> > 3y. - deparse_CreateFunction
> >
> > +/*
> > + * Deparse a CreateFunctionStmt (CREATE FUNCTION)
> > + *
> > + * Given a function OID and the parsetree that created it, return the JSON
> > + * blob representing the creation command.
> > + */
> >
> > "parsetree" -> "parse tree"
> >
> > ~
> >
> > 3z. - deparse_CreateFunction
> >
> > + /*
> > + * Now iterate over each parameter in the parsetree to create the
> > + * parameters array.
> > + */
> >
> > "parsetree" -> "parse tree"
> >
> > ~
> >
> > 4a. - deparse_CreateFunction
> >
> > + /*
> > + * A PARAM_TABLE parameter indicates end of input arguments; the
> > + * following parameters are part of the return type. We ignore them
> > + * here, but keep track of the current position in the list so that
> > + * we can easily produce the return type below.
> > + */
> >
> > "end" -> "the end"
> >
> > ~
> >
> > 4b. - deparse_CreateOpClassStmt
> >
> > + /* Don't deparse sql commands generated while creating extension */
> > + if (cmd->in_extension)
> > + return NULL;
> >
> > "sql" -> "SQL"
> >
> > ~
> >
> > 4c. - deparse_CreateOpClassStmt
> >
> > /*
> > + * Add the FAMILY clause; but if it has the same name and namespace as the
> > + * opclass, then have it expand to empty, because it would cause a failure
> > + * if the opfamily was created internally.
> > + */
> >
> > "clause; " -> "clause, "
> >
> > "empty," -> "empty"
> >
> > ~
> >
> > 4d. - deparse_CreateOpFamily
> >
> > +/*
> > + * Deparse a CreateTrigStmt (CREATE OPERATOR FAMILY)
> > + *
> > + * Given a trigger OID and the parsetree that created it, return an ObjTree
> > + * representing the creation command.
> > + */
> >
> > "parsetree" -> "parse tree"
> >
> > ~
> >
> > 4e. - deparse_CreateSchemaStmt
> >
> > +/*
> > + * Deparse a CreateSchemaStmt.
> > + *
> > + * Given a schema OID and the parsetree that created it, return an ObjTree
> > + * representing the creation command.
> > + *
> > + */
> >
> > "parsetree" -> "parse tree"
> >
> > ~
> >
> > 4f. - deparse_CreateSeqStmt
> >
> > +/*
> > + * Deparse a CreateSeqStmt.
> > + *
> > + * Given a sequence OID and the parsetree that create it, return an ObjTree
> > + * representing the creation command.
> > + */
> >
> > "parsetree" -> "parse tree"
> >
> > "create it" -> "created it"
> >
> > ~
> >
> > 4g. - deparse_CreateStmt
> >
> > +/*
> > + * Deparse a CreateStmt (CREATE TABLE).
> > + *
> > + * Given a table OID and the parsetree that created it, return an ObjTree
> > + * representing the creation command.
> > + */
> >
> > "parsetree" -> "parse tree"
> >
> > ~
> >
> > 4h.
> >
> > + /*
> > + * Typed tables and partitions use a slightly different format string: we
> > + * must not put table_elements with parents directly in the fmt string,
> > + * because if there are no options the parens must not be emitted; and
> > + * also, typed tables do not allow for inheritance.
> > + */
> >
> > "parens" -> "parentheses"
> >
> > ~
> >
> > 4i. - deparse_CreateStmt
> >
> > + /*
> > + * We can't put table elements directly in the fmt string as an array
> > + * surrounded by parens here, because an empty clause would cause a
> > + * syntax error. Therefore, we use an indirection element and set
> > + * present=false when there are no elements.
> > + */
> >
> > "parens" -> "parentheses"
> >
> > ~
> >
> > 4j. - deparse_CreateStmt
> >
> > + /*
> > + * Get pg_class.relpartbound. We cannot use partbound in the
> > + * parsetree directly as it's the original partbound expression
> > + * which haven't been transformed.
> > + */
> >
> > "parsetree" -> "parse tree" ? maybe this one if ok if it referring to
> > the parameter with this name.
> >
> > ~
> >
> > 4k. - deparse_DefineStmt_Operator
> >
> > +/*
> > + * Deparse a DefineStmt (CREATE OPERATOR)
> > + *
> > + * Given a trigger OID and the parsetree that created it, return an ObjTree
> > + * representing the creation command.
> > + */
> >
> > "parsetree" -> "parse tree"
> >
> > ~
> >
> > 4l. - deparse_CreateTrigStmt
> >
> > +/*
> > + * Deparse a CreateTrigStmt (CREATE TRIGGER)
> > + *
> > + * Given a trigger OID and the parsetree that created it, return an ObjTree
> > + * representing the creation command.
> > + */
> >
> > "parsetree" -> "parsetree"
> >
> > ~
> >
> > 4m. - deparse_RefreshMatViewStmt
> >
> > +/*
> > + * Deparse a RefreshMatViewStmt (REFRESH MATERIALIZED VIEW)
> > + *
> > + * Given a materialized view OID and the parsetree that created it, return an
> > + * ObjTree representing the refresh command.
> > + */
> >
> > "parseree" -> "parse tree"
> >
> > ~
> >
> > 4n. - deparse_IndexStmt
> >
> > +/*
> > + * Deparse an IndexStmt.
> > + *
> > + * Given an index OID and the parsetree that created it, return an ObjTree
> > + * representing the creation command.
> > + *
> > + * If the index corresponds to a constraint, NULL is returned.
> > + */
> >
> > "parsetree" -> "parse tree"
> >
> > ~
> >
> > 4o. - deparse_InhRelations
> >
> > +/*
> > + * Deparse the INHERITS relations.
> > + *
> > + * Given a table OID, return a schema qualified table list representing
> > + * the parent tables.
> > + */
> > +static List *
> > +deparse_InhRelations(Oid objectId)
> >
> > "schema qualified" -> "schema-qualified"
> >
> > ~
> >
> > 4p. - deparse_OnCommitClause
> >
> > +/*
> > + * Deparse the ON COMMMIT ... clause for CREATE ... TEMPORARY ...
> > + */
> > +static ObjTree *
> > +deparse_OnCommitClause(OnCommitAction option)
> >
> > "COMMMIT" -> "COMMIT"
> >
> > ~
> >
> > 4q. - deparse_RuleStmt
> >
> > +/*
> > + * Deparse a RuleStmt (CREATE RULE).
> > + *
> > + * Given a rule OID and the parsetree that created it, return an ObjTree
> > + * representing the rule command.
> > + */
> >
> > "parsetree" -> "parse tree"
> >
> > ~
> >
> > 4r. - deparse_utility_command
> >
> > + /*
> > + * Allocate everything done by the deparsing routines into a temp context,
> > + * to avoid having to sprinkle them with memory handling code; but allocate
> > + * the output StringInfo before switching.
> > + */
> >
> > "code; " -> "code, "
> >
> > ~
> >
> > 4s. - deparse_utility_command
> >
> > + /*
> > + * Many routines underlying this one will invoke ruleutils.c functionality
> > + * in order to obtain deparsed versions of expressions. In such results,
> > + * we want all object names to be qualified, so that results are "portable"
> > + * to environments with different search_path settings. Rather than inject
> > + * what would be repetitive calls to override search path all over the
> > + * place, we do it centrally here.
> > + */
> >
> > "in order to" => "to"
> >
> > ~
> >
> > 4t. - convSpecifier
> >
> > +/*
> > + * Conversion specifier, it determines how we expand the json element into
> > + * string.
> > + */
> >
> > SUGGESTION
> > Conversion specifier which determines how we expand the json element
> > into a string.
> >
> > ~
> >
> > 4u. - json_trivalue
> >
> > +/*
> > + * A ternary value which represents a boolean type JsonbValue.
> > + */
> >
> > "which represents" -> "that represents"
> >
> > ~
> >
> > 4v. - expand_fmt_recursive
> >
> > + /*
> > + * Scan the mandatory element name. Allow for an array separator
> > + * (which may be the empty string) to be specified after colon.
> > + */
> >
> > "after colon" -> "after a colon"
> >
> > ~
> >
> > 4w. - expand_jsonval_string
> >
> > +/*
> > + * Expand a json value as a string. The value must be of type string or of
> > + * type object. In the latter case it must contain a "fmt" element which will
> > + * be recursively expanded; also, if the object contains an element "present"
> > + * and it is set to false, the expansion is the empty string.
> > + *
> > + * Returns false if no actual expansion was made due to the "present" flag
> > + * being set to "false".
> > + */
> >
> > "latter case" -> "latter case,"
> >
> > ~
> >
> > 4x. - format_procedure_args_internal
> >
> > +/*
> > + * Append the parenthised arguments of the given pg_proc row into the output
> > + * buffer. force_qualify indicates whether to schema-qualify type names
> > + * regardless of visibility.
> > + */
> >
> > "parenthised" -> "parenthesized "
> >
> > ~
> >
> > 4y.
> >
> > +/*
> > + * Internal version that returns definition of a CONSTRAINT command
> > + */
> >
> > "definition" -> "the definition"
> >
> > ======
> >
> > 5. Function comment inconsistencies
> >
> > 5a.
> >
> > Sometimes the function name is repeated in the comment and sometimes it is not.
> >
> > e.g. compare deparse_CreateEnumStmt() versus deparse_CreateExtensionStmt(), etc.
> >
> > (IMO there is no need to repeat the function name)
> >
> > ~
> >
> > 5b.
> >
> > Sometimes the deparse function comments are verbose and say like:
> >
> > + * Given a type OID and a parsetree that modified it, return an ObjTree
> > + * representing the alter type.
> >
> > but sometimes - like deparse_AlterExtensionStmt() etc. - they don't
> > bother to say anything at all.
> >
> > e.g.
> > +/*
> > + * Deparse ALTER EXTENSION .. UPDATE TO VERSION
> > + */
> > +static ObjTree *
> > +deparse_AlterExtensionStmt(Oid objectId, Node *parsetree)
> >
> > Either it is useful, so say it always, or it is not useful, so say it
> > never. Pick one.
> >
> > ------
> >
> > 6. GENERAL - json
> >
> > IMO change "json" -> "JSON" everywhere.
> >
> > Here are some examples:
> >
> > Line 7605: + * Conversion specifier, it determines how we expand the
> > json element into
> > Line 7699: + errmsg("missing element \"%s\" in json object", keyname)));
> > Line 7857: + * Expand a json value as a quoted identifier. The value
> > must be of type string.
> > Line 7872: + * Expand a json value as a dot-separated-name. The value
> > must be of type
> > Line 7908: + * Expand a json value as a type name.
> > Line 7966: + * Expand a json value as an operator name
> > Line 7993: + * Expand a json value as a string. The value must be of
> > type string or of
> > Line 8031: + * Expand a json value as a string literal.
> > Line 8070: + * Expand a json value as an integer quantity.
> > Line 8083: + * Expand a json value as a role name. If the is_public
> > element is set to
> > Line 8111: + * Expand one json element into the output StringInfo
> > according to the
> > Line 8807: +{ oid => '4642', descr => 'deparse the DDL command into
> > json format string',
> > Line 8810: +{ oid => '4643', descr => 'expand json format DDL to a
> > plain DDL command',
> >
> > ------
>
> I've addressed all the above comments.

Headercheck was failing in cfbot. The attached patch has the fixes for the same.

Regards,
Vignesh

Attachment	Content-Type	Size
v28-0004-Test-cases-for-DDL-replication.patch	text/x-patch	24.6 KB
v28-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	text/x-patch	15.0 KB
v28-0002-Support-DDL-replication.patch	text/x-patch	132.0 KB
v28-0001-Functions-to-deparse-DDL-commands.patch	text/x-patch	293.7 KB

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "runqidev(at)gmail(dot)com" <runqidev(at)gmail(dot)com>
Subject:	Re: Support logical replication of DDLs
Date:	2022-10-18 01:08:55
Message-ID:	CAAD30UKKkfPLRohPGa6XXJD9v1ZhRK3==3xNnqT0hDNtn5A+mg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Hi,

Attaching the v29 of the patch set which adds support for VIEW and
MATERIALIZED VIEW commands:
CREATE/ALTER/DROP VIEW
CREATE/ALTER/DROP MATERIALIZED VIEW

Note that, for VIEWSTMT, We created the function
pg_get_viewdef_internal() to get the
SELECT query of the pending VIEWSTMT. In the case that the CREATE VIEW command
is still in progress, we need to search the system cache RULERELNAME to get the
rewrite rule of the view as opposed to querying pg_rewrite as in
pg_get_viewdef_worker(),
the latter will return an empty result.

Thanks to Runiqi Tian for providing the changes on CREATE MATERIALIZED VIEW.

Regards,
Zheng

Attachment	Content-Type	Size
v29-0004-Test-cases-for-DDL-replication.patch	application/octet-stream	24.6 KB
v29-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	application/octet-stream	15.0 KB
v29-0002-Support-DDL-replication.patch	application/octet-stream	132.3 KB
v29-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	300.6 KB

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-10-19 05:48:05
Message-ID:	CALDaNm08gZq9a7xnsbaJMmHmi29_kbEuyShHHfxAKLXPh6btWQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Wed, 12 Oct 2022 at 11:38, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Tue, Oct 11, 2022 at 7:00 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
> >
>
> I was going through 0001 patch and I have a few comments/suggestions.
>
> 1.
>
> @@ -6001,7 +6001,7 @@ getObjectIdentityParts(const ObjectAddress *object,
> transformType = format_type_be_qualified(transform->trftype);
> transformLang = get_language_name(transform->trflang, false);
>
> - appendStringInfo(&buffer, "for %s on language %s",
> + appendStringInfo(&buffer, "for %s language %s",
> transformType,
> transformLang);
>
>
> How is this change related to this patch?

This change is required for ddl of transform commands, we have added a
note for the same in the patch:
Removed an undesirable 'on' from the identity string for TRANSFORM in
getObjectIdentityParts. This is needed to make deparse of DROP
TRANSFORM command work since 'on' is not present in the current DROP
TRANSFORM syntax. For example, the correct syntax is
drop transform trf for int language sql;
instead of
drop transform trf for int on language sql;

> 2.
> +typedef struct ObjTree
> +{
> + slist_head params;
> + int numParams;
> + StringInfo fmtinfo;
> + bool present;
> +} ObjTree;
> +
> +typedef struct ObjElem
> +{
> + char *name;
> + ObjType objtype;
> +
> + union
> + {
> + bool boolean;
> + char *string;
> + int64 integer;
> + float8 flt;
> + ObjTree *object;
> + List *array;
> + } value;
> + slist_node node;
> +} ObjElem;
>
> It would be good to explain these structure members from readability pov.

Modified

> 3.
>
> +
> +bool verbose = true;
> +
>
> I do not understand the usage of such global variables. Even if it is
> required, add some comments to explain the purpose of it.

Modified

>
> 4.
> +/*
> + * Given a CollectedCommand, return a JSON representation of it.
> + *
> + * The command is expanded fully, so that there are no ambiguities even in the
> + * face of search_path changes.
> + */
> +Datum
> +ddl_deparse_to_json(PG_FUNCTION_ARGS)
> +{
>
> It will be nice to have a test case for this utility function.

We are discussing how to test in a separate thread at [1]. We will
implement accordingly once it is concluded. This comment will be
handled at that time.
[1] - https://www.postgresql.org/message-id/flat/CAH8n8_jMTunxxtP4L-3tc%3DGNamg%3Dmg1X%3DtgHr9CqqjjzFLwQng%40mail.gmail.com

This patch also includes changes for replication of:
CREATE/ALTER/DROP STATISTICS
and pgindent fixes for the ddl replication code.
Thanks for the comments, the attached v30 patch has the changes for the same.

Regards,
Vignesh

Attachment	Content-Type	Size
v30-0001-Functions-to-deparse-DDL-commands.patch	text/x-patch	305.7 KB
v30-0002-Support-DDL-replication.patch	text/x-patch	132.7 KB
v30-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	text/x-patch	15.0 KB
v30-0004-Test-cases-for-DDL-replication.patch	text/x-patch	24.6 KB

From:	Ajin Cherian <itsajin(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-10-24 12:29:34
Message-ID:	CAFPTHDbGRpXOF-j+oemm=xkzjrh-ZnqPV7SMS_+QBPYdUwhWxg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Wed, Oct 19, 2022 at 4:48 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>

> This patch also includes changes for replication of:
> CREATE/ALTER/DROP STATISTICS
> and pgindent fixes for the ddl replication code.
> Thanks for the comments, the attached v30 patch has the changes for the same.
>

Adding support for deparsing of CREATE/ALTER/DROP LANGUAGE for ddl replication.

regards,
Ajin Cherian
Fujitsu Australia

Attachment	Content-Type	Size
v31-0004-Test-cases-for-DDL-replication.patch	application/octet-stream	24.6 KB
v31-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	application/octet-stream	15.0 KB
v31-0002-Support-DDL-replication.patch	application/octet-stream	132.5 KB
v31-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	307.6 KB

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	Ajin Cherian <itsajin(at)gmail(dot)com>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-10-26 20:39:24
Message-ID:	CAAD30UL=wV7pvdwJ6w5UQUc3urHs6W2vnomK2MKUyek3rzsKcA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

> Adding support for deparsing of CREATE/ALTER/DROP LANGUAGE for ddl replication.

Adding support for deparsing of:
COMMENT
ALTER DEFAULT PRIVILEGES
CREATE/DROP ACCESS METHOD

Regards,
Zheng

Attachment	Content-Type	Size
v32-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	application/octet-stream	15.0 KB
v32-0004-Test-cases-for-DDL-replication.patch	application/octet-stream	24.6 KB
v32-0002-Support-DDL-replication.patch	application/octet-stream	133.0 KB
v32-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	317.6 KB

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-10-27 10:32:21
Message-ID:	CALDaNm3US8AysFPf6f6vLSct60fQuyN7g5TvQTQ0RR2O8tA3tw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Thu, 27 Oct 2022 at 02:09, Zheng Li <zhengli10(at)gmail(dot)com> wrote:
>
> > Adding support for deparsing of CREATE/ALTER/DROP LANGUAGE for ddl replication.
>
> Adding support for deparsing of:
> COMMENT
> ALTER DEFAULT PRIVILEGES
> CREATE/DROP ACCESS METHOD

Adding support for deparsing of:
ALTER/DROP ROUTINE

The patch also includes fixes for the following issues:
ALTER FUNCTION with function arguments
Setting "RETURNS NULL ON NULL INPUT"
Setting "SECURITY DEFINER"
Setting "SECURITY INVOKER"
Setting "LEAKPROOF"
Setting "NOT LEAKPROOF"

Regards,
Vignesh

Attachment	Content-Type	Size
v33-0001-Functions-to-deparse-DDL-commands.patch	text/x-patch	317.8 KB
v33-0002-Support-DDL-replication.patch	text/x-patch	133.1 KB
v33-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	text/x-patch	15.0 KB
v33-0004-Test-cases-for-DDL-replication.patch	text/x-patch	24.6 KB

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-10-27 23:18:38
Message-ID:	CAHut+PuSwLYTvo042SXDrSsCcmj7BPqxJFP4hvUQ5KMmC2L5Cw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Hi, authors on this thread.

The patch v32-0001 is very large, so it will take some time to review
the code in detail.

Meanwhile, here are some general comments about the patch:

======

1. It might be useful to add this thread to the commitfest, if only so
the cfbot can discover the latest patch set and alert about any rebase
problems.

~~~

2. User interface design/documentation?

Please consider adding user interface documentation, so it is
available for review sooner than later. This comment was previous
posted 2 weeks ago [1] but no replies.

I can only guess (from the test of patch 0004) that the idea is to use
another 'ddl' option for the 'publish' parameter:
CREATE PUBLICATION mypub FOR ALL TABLES with (publish = 'insert,
update, delete, ddl');

My first impression is that a blanket option like that could be
painful for some users who DO (for example) want the convenience of
the DDL replication automagically creating new tables on the fly, but
maybe they do NOT want the side-effect of replicating every other kind
of DDL as well.

Maybe such scenarios can be handled by another publication parameter
which can allow more fine-grained DDL replication like:
CREATE PUBLICATION mypub FOR ALL TABLES WITH (ddl = 'tables')

I also have lots of usability questions but probably the documentation
would give the answers to those. IMO the docs for the user interface
and runtime behaviours should not be deferred - they should form part
of this patch 0001.

~~~

3. Why all-or-nothing?

The current strategy for this thread appears to be to implement
*everything* in the underlying code, and then figure out what to do
with it. I'm not sure if the all-or-nothing approach is best - It just
feels risky to me, so I hope it will not result in a ton of work that
ends up unused.

Why not instead implement just some core set of the DDL replications
that are the most wanted ones (e.g. create/alter/drop
tables/whatever?) and try to get that subset committed first? Then the
remainder can be added incrementally. But this harks back to comment
#2: the user interface would need to allow flexibility to do it like
this.

~~~

4. Function name inconsistency.

Even if it was not obvious from the posts, it is clear there are
multiple authors. As an example, the file
src/backend/commands/ddl_deparse.c is 9200+ lines (and growing
rapidly) and the functions in this module are a mixture of many
different naming conventions and they seem scattered around the source
file in different ways.

I suggest this all needs to be brought under some control ASAP, by
introducing some strict naming convention and sticking to it.

For example, you might do something like:
* xxx_util_foo()
* xxx_util_bah()
* xxx_deparse_alter()
* xxx_deparse_create()
* xxx_whatever()
where xxx is the main object (table, sequence, schema, etc).

Then order everything alphabetically, so that related stuff ends up
together. IMO grouping functions like this will also make reviewing
different objects far easier.

~~~

5. File size

As mentioned above in #4, the src/backend/commands/ddl_deparse.c is
huge (9200+ lines as at v32-0001). It is already unwieldy. Is there
some way to reduce this? For example, perhaps many of those
"utility/helper" functions (even though they are static) would be
better moved out to another file simply to get things down to a more
manageable size.

------
[1] https://www.postgresql.org/message-id/CAHut%2BPtKiTmcQ7zXs6YvR-qtuMQ9wgffnfamqCAVpM_ETa2LCg%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-10-28 00:20:42
Message-ID:	CAAD30UKX=PbojrjU0webYy7Y9mz1HmDTM3dx_JJXpc+dXW-EQQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

> Hi, authors on this thread.
>
> The patch v32-0001 is very large, so it will take some time to review
> the code in detail.

Thanks for reviewing!

> Meanwhile, here are some general comments about the patch:
>
> ======
>
> 1. It might be useful to add this thread to the commitfest, if only so
> the cfbot can discover the latest patch set and alert about any rebase
> problems.

There is already a commitfest entry for the thread that I added back in March:
https://commitfest.postgresql.org/40/3595/

> 2. User interface design/documentation?
>
> Please consider adding user interface documentation, so it is
> available for review sooner than later. This comment was previously
> posted 2 weeks ago [1] but no replies.
>
> I can only guess (from the test of patch 0004) that the idea is to use
> another 'ddl' option for the 'publish' parameter:
> CREATE PUBLICATION mypub FOR ALL TABLES with (publish = 'insert,
> update, delete, ddl');
>
> My first impression is that a blanket option like that could be
> painful for some users who DO (for example) want the convenience of
> the DDL replication automagically creating new tables on the fly, but
> maybe they do NOT want the side-effect of replicating every other kind
> of DDL as well.
>
> Maybe such scenarios can be handled by another publication parameter
> which can allow more fine-grained DDL replication like:
> CREATE PUBLICATION mypub FOR ALL TABLES WITH (ddl = 'tables')
>
> I also have lots of usability questions but probably the documentation
> would give the answers to those. IMO the docs for the user interface
> and runtime behaviours should not be deferred - they should form part
> of this patch 0001.

We've been deferring the discussion on user interface syntax (and
documentation) until we
get the DDL deparser in a good shape. I agree it's time
to pick up the discussion again now that we're getting close to fully
integrating
the DDL deparser with DDL replication. I think it makes sense to introduce
different DDL replication granularity levels, for example, I think the
most important levels
would be ddl = 'tables' and ddl = 'database' (or ddl = 'all').

> 5. File size
>
> As mentioned above in #4, the src/backend/commands/ddl_deparse.c is
> huge (9200+ lines as at v32-0001). It is already unwieldy. Is there
> some way to reduce this? For example, perhaps many of those
> "utility/helper" functions (even though they are static) would be
> better moved out to another file simply to get things down to a more
> manageable size.

Yes, I think we can split patch 0001 into a bare-bone patch for a few
essential commands and a patch
for the rest of the commands for ease of review.

Another topic we haven't discussed is the ownership of the replicated
objects. Currently all the replicated
objects are owned by the subscription owner regardless of their owners
in the publisher database. I think
we can consider making it user configurable so that the ownership of
the replicated objects match that of their original owner in
certain use cases such as in a full database logical replica scenario.
Otherwise the DBA will have to
fix the ownership structure manually which could be painful.

Thoughts?

Regards,
Zheng

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-10-28 00:38:50
Message-ID:	CAHut+Ps2TZoqJtmhuLLBfeND6e6Unq40OtGec2LWeA2X_0nvsg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, Oct 28, 2022 at 11:20 AM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
>

> > 1. It might be useful to add this thread to the commitfest, if only so
> > the cfbot can discover the latest patch set and alert about any rebase
> > problems.
>
> There is already a commitfest entry for the thread that I added back in March:
> https://commitfest.postgresql.org/40/3595/

Sorry, I missed that earlier because I searched only by authors, and
some were missing. Now I saw it has just been updated - thanks.

------
Kind Regards,
Peter Smith.
Fujitsu Australia.

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-10-28 06:50:05
Message-ID:	CAHut+Psm+9q++y8b70QTeBeZiYcfDNtc_obxSwhxukp0MFhnRA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Here are some review comments for patch v32-0001.

This is a WIP - I have not yet looked at the largest file of this
patch (src/backend/commands/ddl_deparse.c)

======

Commit Message

The list of the supported statements should be in alphabetical order
to make it easier to read

~~~

The "Notes" are obviously notes, so the text does not need to say
"Note that..." etc again

"(Note #1) Note that some..." -> "(Note #1) Some..."

"(Note #2) Note that, for..." -> "(Note #2) For..."

"(Note #4) Note that, for..." -> "(Note #4) For..."

~~~

For "Note #3", use uppercase for the SQL keywords in the example.

~~~

For "Note #4":

"We created" -> "we created"

======

src/backend/catalog/aclchk.c

5. ExecuteGrantStmt

@@ -385,7 +385,11 @@ ExecuteGrantStmt(GrantStmt *stmt)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("grantor must be current user")));
+
+ istmt.grantor_uid = grantor;
}
+ else
+ istmt.grantor_uid = InvalidOid;

This code can be simpler by just declaring the 'grantor' variable at
function scope, then assigning the istmt.grantor_uid along with the
other grantor assignments.

SUGGESTION
Oid grantor = InvalidOid;
...
istmt.grantor_uid = grantor;
istmt.is_grant = stmt->is_grant;
istmt.objtype = stmt->objtype;

======

src/backend/commands/collationcmds.c

6. DefineCollation

+ /* make from existing collationid available to callers */
+ if (from_collid && OidIsValid(collid))
+ ObjectAddressSet(*from_collid,
+ CollationRelationId,
+ collid);

6a.
Maybe the param can be made 'from_existing_colid', then the above code
comment can be made more readable?

6b.
Seems some unnecessary wrapping here

======

src/backend/commands/ddl_deparse.c

WIP - I will try to post some review comments on this file next week

======

src/backend/commands/ddl_json.c

7. convSpecifier

typedef enum
{
SpecTypename,
SpecOperatorname,
SpecDottedName,
SpecString,
SpecNumber,
SpecStringLiteral,
SpecIdentifier,
SpecRole
} convSpecifier;

Inconsistent case. Some of these say "name" and some say "Name"

~~~

8. Forward declarations

char *ddl_deparse_json_to_string(char *jsonb);

Is this needed here? I thought this was already declared extern in
ddl_deparse.h.

~~~

9. find_string_in_jsonbcontainer

The function comment says "If it's of a type other than jbvString, an
error is raised.", but I do not see this check in the function code.

~~~

10. expand_fmt_recursive

/*
* Recursive helper for pg_event_trigger_expand_command
*
* Find the "fmt" element in the given container, and expand it into the
* provided StringInfo.
*/

10a.
I am not sure if the mention of "pg_event_trigger_expand_command" is
stale or is not relevant anymore, because that caller is not in this
module.

10b.
The first sentence is missing a period.

~~~

11.

value = findJsonbValueFromContainer(container, JB_FOBJECT, &key);

Should this be checking is value is NULL?

~~~

12. expand_jsonval_dottedname

* Expand a json value as a dot-separated-name. The value must be of type
* object and may contain elements "schemaname" (optional), "objname"
* (mandatory), "attrname" (optional). Double quotes are added to each element
* as necessary, and dot separators where needed.

The comment says "The value must be of type object" but I don't see
any check/assert for that in the code.

~~~

13. expand_jsonval_typename

In other code (e.g. expand_jsonval_dottedname) there are lots of
pfree(str) so why not similar here?

e.g. Shouldn’t the end of the function have like shown below:
pfree(schema);
pfree(typename);
pfree(typmodstr);

~~~

14. expand_jsonval_operator

The function comment is missing a period.

~~~

15. expand_jsonval_string

/*
* Expand a JSON value as a string. The value must be of type string or of
* type object. In the latter case, it must contain a "fmt" element which will
* be recursively expanded; also, if the object contains an element "present"
* and it is set to false, the expansion is the empty string.

15a.
Although the comment says "The value must be of type string or of type
object" the code is checking for jbvString and jbvBinary (??)

15b.
else
return false;

Is that OK to just return false, or should this in fact be throwing an
error if the wrong type?

~~~

16. expand_jsonval_strlit

/* Easy case: if there are no ' and no \, just use a single quote */
if (strchr(str, '\'') == NULL &&
strchr(str, '\\') == NULL)

That could be simplified as:

if ((strpbk(str, "\'\\") == NULL)

~~~

17. expand_jsonval_number

strdatum = DatumGetCString(DirectFunctionCall1(numeric_out,

NumericGetDatum(jsonval->val.numeric)));
appendStringInfoString(buf, strdatum);

Shouldn't this function do pfree(strdatum) at the end?

~~~

18. expand_jsonval_role

/*
* Expand a JSON value as a role name. If the is_public element is set to
* true, PUBLIC is expanded (no quotes); otherwise, expand the given role name,
* quoting as an identifier.
*/

Maybe better to quote that element name -> 'If the "is_public" element
is set to true...'

~~~

19. expand_one_jsonb_element

The enum jbvType definition says that jbvBinary is a combination of
array/object, so I am not sure if that should be reflected in the
errmsg text (multiple places in this function body) instead of only
saying "JSON object".

~~~

20. ddl_deparse_expand_command

* % expand to a literal %.

Remove the period from that line (because not of the other specifier
descriptions have one).

======

src/backend/utils/adt/regproc.c

21. format_procedure_args_internal

+static void
+format_procedure_args_internal(Form_pg_proc procform, StringInfo buf,
+ bool force_qualify)
+{
+ int i;
+ int nargs = procform->pronargs;

The 'nargs' var is used one time only, so hardly seems worth having it.

~~~

22.

+ appendStringInfoString(buf,
+ force_qualify ?
+ format_type_be_qualified(thisargtype) :
+ format_type_be(thisargtype));

22a.
Should these function results be assigned to a char* ptr so that they
can be pfree(ptr) AFTER being appended to the 'buf'?

22b.
It's not really nececessary to check the force_qualify at every
iteration. More effient to asign a function pointer outside this loop
and just call that here. IIRC something like this:

char * (*func[2])(Oid) = { format_type_be, format_type_be_qualified };

...

then
appendStringInfoString(buf, func[force_qualify](thisargtype))

======

src/backend/utils/adt/ruleutils.c

23. pg_get_ruledef_detailed

Instead of the multiple if/else it might be easier to just assignup-front:
*whereClause = NULL;
*actions = NIL;

Then the if blocks can just overwrite them.

Also, if you do that, then I expect probably the 'output' temp list
var is not needed at all.

~~~

24. pg_get_viewdef_internal

/*
* In the case that the CREATE VIEW command execution is still in progress,
* we would need to search the system cache RULERELNAME to get the rewrite
* rule of the view as oppose to querying pg_rewrite as in
pg_get_viewdef_worker(),
* the latter will return empty result.
*/

24a.
I'm not quite sure of the context of this function call. Maybe the
comment was supposed to be worded more like below?

"Because this function is called when CREATE VIEW command execution is
still in progress, we need to search..."

24b.
"as oppose" -> "as opposed"

~~~

25. pg_get_triggerdef_worker

if (!isnull)
{
Node *qual;
char *qualstr;

qual = stringToNode(TextDatumGetCString(value));
qualstr = pg_get_trigger_whenclause(trigrec, qual, pretty);

appendStringInfo(&buf, "WHEN (%s) ", qualstr);
}

After appending the qualstr to buf, should there be a pfree(qualstr)?

~~~

26. pg_get_trigger_whenclause

Missing function comment.

~~~

27. print_function_sqlbody

-static void
+void
print_function_sqlbody(StringInfo buf, HeapTuple proctup)
{

Missing function comment. Probably having a function comment is more
important now that this is not static?

======

src/include/tcop/ddl_deparse.h

28.

+extern char *deparse_utility_command(CollectedCommand *cmd, bool verbose_mode);
+extern char *ddl_deparse_json_to_string(char *jsonb);
+extern char *deparse_drop_command(const char *objidentity, const char
*objecttype,
+ DropBehavior behavior);

Function naming seems inconsistent. ('ddl_deparse_XXX' versus 'deparse_XXX').

------
Kind Regards,
Peter Smith.
Fujitsu Australia

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-10-29 09:30:42
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On 2022-Oct-28, Peter Smith wrote:

> Hi, authors on this thread.
>
> The patch v32-0001 is very large, so it will take some time to review
> the code in detail.

Yes, and I think there are a few things that should be split out from
it. Namely, all the stuff that generates the JSON blob should appear in
a second patch after supporting changes to other places such as
ruleutils and suchlike.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Estoy de acuerdo contigo en que la verdad absoluta no existe...
El problema es que la mentira sí existe y tu estás mintiendo" (G. Lama)

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-10-31 08:07:37
Message-ID:	CAHut+PsO0dwWoB4B6L3Ucd6D8ckgdXY2Sd3JK7F_3wLsXU7ZAg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Here are some more comments for the patch v32-0001, file:
src/backend/commands/ddl_deparse.c

This is a WIP, it being such a large file...

======

1. General - comments

For better consistency, I suggest using uppercase for all the
single-line comments in the function bodies.

There are multiple of them - I am not going to itemize them all in
this post. Please just search/replace all of them

e.g.
/* add the "ON table" clause */
/* add the USING clause, if any */
/* add the USING clause, if any */

~~~

2. General - object names

There is a bit of inconsistency with the param object names where
there are multi-words.

Some have underscore (e.g. "is_public", "subtype_diff", "if_not_exists", etc)...
Many others do not (e.g. "schemaname", "objname", "rolename", etc)...

IMO it would be better to use a consistent naming convention - e,g,
maybe use '_' *everywhere*

~~~

3. ObjTree

+typedef struct ObjTree
+{
+ slist_head params; /* Object tree parameters */
+ int numParams; /* Number of parameters in the object tree */
+ StringInfo fmtinfo; /* Format string of the ObjTree */
+ bool present; /* Indicates if boolean value should be stored */
+} ObjTree;

It seems that this member is called "parameters" in the sense that
each of these params are destined to be substition-params of for the
format string part of this struct.

OK. That seems sensible here, but this 'parameter' terminology infests
this whole source file. IIUC really much of the code is dealing with
just JSON objects -- they don't become parameters until those objects
get added into the params list of this structure. Basically, I felt
the word 'parameter' in comments and the variables called 'param' in
functions seemed a bit overused...

~~~

4. ObjElem

+ slist_node node; /* Used in converting back to ObjElem
+ * structure */
+} ObjElem;

At face value (and without yet seeing the usage), that comment about
'node' does not mean much. e.g. this is already an 'ObjElem' struct...
(??)

~~~

5. verbose

+/*
+ * Reduce some unncessary string from the output json stuff when verbose
+ * and "present" member is false. This means these strings won't be merged into
+ * the last DDL command.
+ */
+bool verbose = true;

The comment needs some rewording to explain what this is about more
clearly and without the typos

"Reduce some unncessary string from the output json stuff" ???

~~~

6. add_policy_clauses

+ else
+ {
+ append_bool_object(policyStmt, "present", false);
+ }

Something seems strange. Probably I'm wrong but just by code
inspection it looks like there is potential for there to be multiple
param {present:false} JSON objects:

{"present" :false},
{"present" :false},
{"present" :false},

Shouldn't those all be array elements or something? IIUC apart from
just DDL, the JSON idea was going to (in future) allow potential
machine manipulation of the values prior to the replication, but
having all these ambiguous-looking objects does not seem to lend
itself to that idea readily. How to know what are each of those params
representing?

~~~

7. append_array_object

+ }
+
+ }

Spurious blank line

+ /* Extract the ObjElems whose present flag is true */
+ foreach(lc, array)
+ {
+ ObjElem *elem = (ObjElem *) lfirst(lc);
+
+ Assert(elem->objtype == ObjTypeObject ||
+ elem->objtype == ObjTypeString);
+
+ if (!elem->value.object->present &&
+ elem->objtype == ObjTypeObject)
+ array = foreach_delete_current(array, lc);
+ }
+
+ }

8a.
Is that comment correct? Or should it say more like "remove elements
where present flag is false" ??

8b.
It's not clear to me what is going to be the result of deleting the
array elements that are determined not present. Will this affect the
length of the array written to JSON? What if there is nothing left at
all - the top of this function return if the array length is zero, but
the bottom(after the loop) has not got similar logic.

~~~

9. append_bool_object

+ /*
+ * Check if the present is part of the format string and store the boolean
+ * value
+ */
+ if (strcmp(sub_fmt, "present") == 0)

The comment seems not right. Looks like not testing "present" is PART
of the format string - it is testing it IS the ENTIRE format string.

~~~

10. append_object_to_format_string

+ initStringInfo(&object_name);
+ end_ptr = sub_fmt + strlen(sub_fmt);
+
+ for (cp = sub_fmt; cp < end_ptr; cp++)
+ {
+ if (*cp == '{')
+ {
+ start_copy = true;
+ continue;
+ }
+
+ if (!start_copy)
+ continue;
+
+ if (*cp == ':' || *cp == '}')
+ break;
+
+ appendStringInfoCharMacro(&object_name, *cp);
+ }

Instead of this little loop why doesn't the code just look for the
name delimiters?

e.g.
pstart = strch(sub_fmt, '{');
pend = strbrk(pstart, ":}");

then the 'name' is what lies in between...

~~~

11.

format_type_detailed(Oid type_oid, int32 typemod,
Oid *nspid, char **typname, char **typemodstr,
bool *typarray)

There seems a bit mixture of param prefixes of both 'typ' and 'type'.
Is it correct? If these are changed, check also in the function
comment.

~~~

12.

+ /*
+ * Special-case crock for types with strange typmod rules where we put
+ * typmod at the middle of name(e.g. TIME(6) with time zone ). We cannot
+ * schema-qualify nor add quotes to the type name in these cases.
+ */

Missing space before '(e.g.'. Extra space before ').'.

~~~

13. FunctionGetDefaults

/*
* Return the defaults values of arguments to a function, as a list of
* deparsed expressions.
*/

"defaults values" -> "default values"

------
Kind Regards,
Peter Smith.
Fujitsu Australia

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-10-31 10:47:46
Message-ID:	CALDaNm30PFSLoBjb_dewmRwbXSmytn1xt8yapjjmstHXGxgbRg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Thu, 27 Oct 2022 at 16:02, vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Thu, 27 Oct 2022 at 02:09, Zheng Li <zhengli10(at)gmail(dot)com> wrote:
> >
> > > Adding support for deparsing of CREATE/ALTER/DROP LANGUAGE for ddl replication.
> >
> > Adding support for deparsing of:
> > COMMENT
> > ALTER DEFAULT PRIVILEGES
> > CREATE/DROP ACCESS METHOD
>
> Adding support for deparsing of:
> ALTER/DROP ROUTINE
>
> The patch also includes fixes for the following issues:

Few comments:
1) If the function specifies table without schema, should we include
the schema name too, else it will fail with "relation does not exist"
in subscription.
+ /* Add the function definition */
+ (void) SysCacheGetAttr(PROCOID, procTup,
Anum_pg_proc_prosqlbody, &isnull);
+ if (procForm->prolang == SQLlanguageId && !isnull)
+ {
+ StringInfoData buf;
+
+ initStringInfo(&buf);
+ print_function_sqlbody(&buf, procTup);
+
+ append_string_object(createFunc, "%{definition}s", buf.data);
+ }
ex:
CREATE PROCEDURE insert_data(a integer)
LANGUAGE SQL
AS $$
INSERT INTO tbl VALUES (a);
$$;

2) This function should handle "alter procedure" too:
+/*
+ * Deparse an AlterFunctionStmt (ALTER FUNCTION/ROUTINE)
+ *
+ * Given a function OID and the parse tree that created it, return the JSON
+ * blob representing the alter command.
+ */
+static ObjTree *
+deparse_AlterFunction(Oid objectId, Node *parsetree)
+{
+ AlterFunctionStmt *node = (AlterFunctionStmt *) parsetree;
+ ObjTree *alterFunc;
+ ObjTree *sign;
+ HeapTuple procTup;

Currently "alter procedure" statement are replicated as "alter
function" statements in the subscriber.

3) In few of the extensions we execute "alter operator family" like in
hstore extension, we should exclude replicating "alter operator
family" when create extension is in progress:
/* Don't deparse SQL commands generated while creating extension */
if (cmd->in_extension)
return NULL;

The above check should be included in the below code, else the create
extension statment will fail as internal statements will be executed:

+static ObjTree *
+deparse_AlterOpFamily(CollectedCommand *cmd)
+{
+ ObjTree *alterOpFam;
+ AlterOpFamilyStmt *stmt = (AlterOpFamilyStmt *) cmd->parsetree;
+ HeapTuple ftp;
+ Form_pg_opfamily opfForm;
+ List *list;
+ ListCell *cell;
+
+ ftp = SearchSysCache1(OPFAMILYOID,
+
ObjectIdGetDatum(cmd->d.opfam.address.objectId));
+ if (!HeapTupleIsValid(ftp))
+ elog(ERROR, "cache lookup failed for operator family %u",
+ cmd->d.opfam.address.objectId);
+ opfForm = (Form_pg_opfamily) GETSTRUCT(ftp);
+

4) This if...else can be removed, the nspid and typname can be handled
for others in default. *nspid can be set to InvalidOid at the
beginning.
+ if (type_oid == INTERVALOID ||
+ type_oid == TIMESTAMPOID ||
+ type_oid == TIMESTAMPTZOID ||
+ type_oid == TIMEOID ||
+ type_oid == TIMETZOID)
+ {
+ switch (type_oid)
+ {
+ case INTERVALOID:
+ *typname = pstrdup("INTERVAL");
+ break;
+ case TIMESTAMPTZOID:
+ if (typemod < 0)
+ *typname = pstrdup("TIMESTAMP
WITH TIME ZONE");
+ else
+ /* otherwise, WITH TZ is added
by typmod. */
+ *typname = pstrdup("TIMESTAMP");
+ break;
+ case TIMESTAMPOID:
+ *typname = pstrdup("TIMESTAMP");
+ break;
+ case TIMETZOID:
+ if (typemod < 0)
+ *typname = pstrdup("TIME WITH
TIME ZONE");
+ else
+ /* otherwise, WITH TZ is added
by typmod. */
+ *typname = pstrdup("TIME");
+ break;
+ case TIMEOID:
+ *typname = pstrdup("TIME");
+ break;
+ }
+ *nspid = InvalidOid;
+ }
+ else
+ {
+ /*
+ * No additional processing is required for other
types, so get the
+ * type name and schema directly from the catalog.
+ */
+ *nspid = typeform->typnamespace;
+ *typname = pstrdup(NameStr(typeform->typname));
+ }

5) The following includes are not required in ddl_deparse.c:
#include "catalog/pg_attribute.h"
#include "catalog/pg_class.h"
#include "lib/ilist.h"
#include "nodes/makefuncs.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"

6) Inconsistent error reporting:
In few places elog is used and in few places ereport is used:
+ HeapTuple polTup = get_catalog_object_by_oid(polRel,
Anum_pg_policy_oid, policyOid);
+ Form_pg_policy polForm;
+
+ if (!HeapTupleIsValid(polTup))
+ elog(ERROR, "cache lookup failed for policy %u", policyOid);

+ char *rolename;
+
+ roltup = SearchSysCache1(AUTHOID, ObjectIdGetDatum(roleoid));
+ if (!HeapTupleIsValid(roltup))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("role with OID %u does
not exist", roleoid)));

We can try to use the same style of error reporting.

7) There is few small indentation issue, we could run pg_ident:
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -50,7 +50,8 @@ typedef struct
* CREATE COLLATION
*/
ObjectAddress
-DefineCollation(ParseState *pstate, List *names, List *parameters,
bool if_not_exists)
+DefineCollation(ParseState *pstate, List *names, List *parameters,
+ bool if_not_exists, ObjectAddress *from_collid)
{

8) Inclusion ordering in ddl_deparse.c:
8.a) The following should be slightly reordered
+#include "access/amapi.h"
+#include "access/table.h"
+#include "access/relation.h"

8.b) The following should be slightly reordered
+#include "postgres.h"
+#include "tcop/ddl_deparse.h"
+#include "access/amapi.h"

9) In few places multi line comment can be changed to single line comment:
9.a)
+ /*
+ * Fetch the pg_class tuple of the index relation
+ */

9.b)
+ /*
+ * Fetch the pg_am tuple of the index' access method
+ */

9.c)
+ /*
+ * Reject unsupported case right away.
+ */

10) This should also specify ROUTINE in the comment
/*
* Verbose syntax
*
* ALTER FUNCTION %{signature}s %{definition: }s
*/
alterFunc = new_objtree_VA(node->objtype == OBJECT_ROUTINE ?
"ALTER ROUTINE" : "ALTER FUNCTION", 0);

11) This can be changed in alphabetical order(collation first and then column):
11.a)
+ case OBJECT_COLUMN:
+ return "COLUMN";
+ case OBJECT_COLLATION:
+ return "COLLATION";
+ case OBJECT_CONVERSION:
+ return "CONVERSION";

11.b) similarly here:
case OBJECT_FDW:
return "FOREIGN DATA WRAPPER";
case OBJECT_FOREIGN_SERVER:
return "SERVER";
case OBJECT_FOREIGN_TABLE:
return "FOREIGN TABLE";

11.c) similarly here:
case OBJECT_FUNCTION:
return "FUNCTION";
case OBJECT_ROUTINE:
return "ROUTINE";
case OBJECT_INDEX:
return "INDEX";

11.d) similarly here:
case OBJECT_OPCLASS:
return "OPERATOR CLASS";
case OBJECT_OPERATOR:
return "OPERATOR";
case OBJECT_OPFAMILY:
return "OPERATOR FAMILY";

11.e) similarly here:
case OBJECT_TRIGGER:
return "TRIGGER";
case OBJECT_TSCONFIGURATION:
return "TEXT SEARCH CONFIGURATION";

/*
* case OBJECT_TSCONFIG_MAPPING:
* return "TEXT SEARCH CONFIGURATION MAPPING";
*/
case OBJECT_TSDICTIONARY:
return "TEXT SEARCH DICTIONARY";
case OBJECT_TSPARSER:
return "TEXT SEARCH PARSER";
case OBJECT_TSTEMPLATE:
return "TEXT SEARCH TEMPLATE";
case OBJECT_TYPE:
return "TYPE";

12) new_objtree can be used instead of new_objtree_VA when there is no
arguments, one additional check can be avoided

12.a) alterFunc = new_objtree_VA(node->objtype == OBJECT_ROUTINE ?
"ALTER ROUTINE" : "ALTER FUNCTION", 0);

12.b) ObjTree *tmpobj = new_objtree_VA("", 0);

12.c) tmpobj = new_objtree_VA(strVal(defel->arg), 0);

12.d) tmpobj = new_objtree_VA("ROWS", 0);

12.e) grantStmt = new_objtree_VA(fmt, 0);

12.f) tmp = new_objtree_VA("ALL PRIVILEGES", 0);

12.g) tmpobj2 = new_objtree_VA("FOR ORDER BY", 0);

12.h) composite = new_objtree_VA("CREATE TYPE", 0);

12.i) tmp = new_objtree_VA("OPTIONS", 0);

12.j) tmp = new_objtree_VA("NO HANDLER", 0);

12.k) .... similarly in few more places .....

13) In a few places we use RowExclusiveLock and in a few places we use
AccessShareLock, is this intentional?
+ ObjTree *tmp;
+
+ rel = table_open(ForeignDataWrapperRelationId, RowExclusiveLock);
+
+ fdwTup = SearchSysCache1(FOREIGNDATAWRAPPEROID,
+
ObjectIdGetDatum(objectId));

+ List *list = NIL;
+ ListCell *cell;
+
+ pg_extension = table_open(ExtensionRelationId, AccessShareLock);

There are similar instances elsewhere too.

14) Can the else statment be removed, since we are not appending anything?
+ if (typForm->typnotnull)
+ append_string_object(createDomain, "%{not_null}s", "NOT NULL");
+ else
+ append_string_object(createDomain, "%{not_null}s", "");

15) This might not be supported currently, this might be a dead code
as we will be throwing an error "CREATE EXTENSION ... FROM is no
longer supported"
+ else if (strcmp(opt->defname, "old_version") == 0)
+ {
+ tmp = new_objtree_VA("FROM %{version}L", 2,
+
"type", ObjTypeString, "from",
+
"version", ObjTypeString, defGetString(opt));
+ list = lappend(list, new_object_object(tmp));
+ }

Regards,
Vignesh

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-11-01 23:43:29
Message-ID:	CALDaNm0ACgK85Dz0NqC717SZW0kRy69BSn6AzBV_B6S4ZmkNCQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Mon, 31 Oct 2022 at 16:17, vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Thu, 27 Oct 2022 at 16:02, vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > On Thu, 27 Oct 2022 at 02:09, Zheng Li <zhengli10(at)gmail(dot)com> wrote:
> > >
> > > > Adding support for deparsing of CREATE/ALTER/DROP LANGUAGE for ddl replication.
> > >
> > > Adding support for deparsing of:
> > > COMMENT
> > > ALTER DEFAULT PRIVILEGES
> > > CREATE/DROP ACCESS METHOD
> >
> > Adding support for deparsing of:
> > ALTER/DROP ROUTINE
> >
> > The patch also includes fixes for the following issues:
>

Few comments:
1) Empty () should be appended in case if there are no table elements:
+ tableelts = deparse_TableElements(relation,
node->tableElts, dpcontext,
+
false, /* not typed table */
+
false); /* not composite */
+ tableelts = obtainConstraints(tableelts, objectId, InvalidOid);
+
+ append_array_object(createStmt, "(%{table_elements:,
}s)", tableelts);

This is required for:
CREATE TABLE ihighway () INHERITS (road);

2)
2.a)
Here cell2 will be of type RoleSpec, the below should be changed:
+ foreach(cell2, (List *) opt->arg)
+ {
+ String *val = lfirst_node(String, cell2);
+ ObjTree *obj =
new_objtree_for_role(strVal(val));
+
+ roles = lappend(roles, new_object_object(obj));
+ }

to:
foreach(cell2, (List *) opt->arg)
{
RoleSpec *rolespec = lfirst(cell2);
ObjTree *obj = new_objtree_for_rolespec(rolespec);

roles = lappend(roles, new_object_object(obj));
}

This change is required for:
ALTER DEFAULT PRIVILEGES FOR ROLE regress_selinto_user REVOKE INSERT
ON TABLES FROM regress_selinto_user;

2.b) After the above change the following function cna be removed:
+/*
+ * Helper routine for %{}R objects, with role specified by name.
+ */
+static ObjTree *
+new_objtree_for_role(char *rolename)
+{
+ ObjTree *role;
+
+ role = new_objtree_VA(NULL,2,
+ "is_public",
ObjTypeBool, strcmp(rolename, "public") == 0,
+ "rolename",
ObjTypeString, rolename);
+ return role;
+}

3) There was a crash in this materialized view scenario:
+ /* add the query */
+ Assert(IsA(node->query, Query));
+ append_string_object(createStmt, "AS %{query}s",
+
pg_get_querydef((Query *) node->query, false));
+
+ /* add a WITH NO DATA clause */
+ tmp = new_objtree_VA("WITH NO DATA", 1,
+ "present", ObjTypeBool,
+ node->into->skipData
? true : false);

CREATE TABLE mvtest_t (id int NOT NULL PRIMARY KEY, type text NOT
NULL, amt numeric NOT NULL);
CREATE VIEW mvtest_tv AS SELECT type, sum(amt) AS totamt FROM mvtest_t
GROUP BY type;
CREATE VIEW mvtest_tvv AS SELECT sum(totamt) AS grandtot FROM mvtest_tv;
CREATE MATERIALIZED VIEW mvtest_tvvm AS SELECT * FROM mvtest_tvv;
CREATE VIEW mvtest_tvvmv AS SELECT * FROM mvtest_tvvm;
CREATE MATERIALIZED VIEW mvtest_bb AS SELECT * FROM mvtest_tvvmv;

#0 0x0000560d45637897 in AcquireRewriteLocks (parsetree=0x0,
forExecute=false, forUpdatePushedDown=false) at rewriteHandler.c:154
#1 0x0000560d45637b93 in AcquireRewriteLocks
(parsetree=0x560d467c4778, forExecute=false,
forUpdatePushedDown=false) at rewriteHandler.c:269
#2 0x0000560d457f792a in get_query_def (query=0x560d467c4778,
buf=0x7ffeb8059bd0, parentnamespace=0x0, resultDesc=0x0,
colNamesVisible=true, prettyFlags=2, wrapColumn=0, startIndent=0) at
ruleutils.c:5566
#3 0x0000560d457ee869 in pg_get_querydef (query=0x560d467c4778,
pretty=false) at ruleutils.c:1639
#4 0x0000560d453437f6 in deparse_CreateTableAsStmt_vanilla
(objectId=24591, parsetree=0x560d467c4748) at ddl_deparse.c:7076
#5 0x0000560d45348864 in deparse_simple_command (cmd=0x560d467c3b98)
at ddl_deparse.c:9158
#6 0x0000560d45348b75 in deparse_utility_command (cmd=0x560d467c3b98,
verbose_mode=false) at ddl_deparse.c:9273
#7 0x0000560d45351627 in publication_deparse_ddl_command_end
(fcinfo=0x7ffeb8059e90) at event_trigger.c:2517
#8 0x0000560d4534eeb1 in EventTriggerInvoke
(fn_oid_list=0x560d467b5450, trigdata=0x7ffeb8059ef0) at
event_trigger.c:1082
#9 0x0000560d4534e61c in EventTriggerDDLCommandEnd
(parsetree=0x560d466e8a88) at event_trigger.c:732
#10 0x0000560d456b6ee2 in ProcessUtilitySlow (pstate=0x560d467cdee8,
pstmt=0x560d466e9a18, queryString=0x560d466e7c38 "CREATE MATERIALIZED
VIEW mvtest_bb AS SELECT * FROM mvtest_tvvmv;",
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
dest=0x560d467cb5d8, qc=0x7ffeb805a6f0) at utility.c:1926

4) The following statements crashes:
BEGIN;
CREATE TABLE t (c int);
SAVEPOINT q;
CREATE RULE "_RETURN" AS ON SELECT TO t DO INSTEAD
SELECT * FROM generate_series(1,5) t0(c); -- fails due to policy p on t
ROLLBACK TO q;
CREATE RULE "_RETURN" AS ON SELECT TO t DO INSTEAD
SELECT * FROM generate_series(1,5) t0(c); -- succeeds
ROLLBACK;

#4 0x00007f3f7c8eb7b7 in __GI_abort () at abort.c:79
#5 0x0000561e569a819c in ExceptionalCondition
(conditionName=0x561e56b932f0 "rel->pgstat_info->relation == NULL",
fileName=0x561e56b932ab "pgstat_relation.c", lineNumber=142) at
assert.c:66
#6 0x0000561e567f3569 in pgstat_assoc_relation (rel=0x7f3d6cc9d4e8)
at pgstat_relation.c:142
#7 0x0000561e5628ade3 in initscan (scan=0x561e57a67648, key=0x0,
keep_startblock=false) at heapam.c:340
#8 0x0000561e5628c7be in heap_beginscan (relation=0x7f3d6cc9d4e8,
snapshot=0x561e579c4da0, nkeys=0, key=0x0, parallel_scan=0x0,
flags=449) at heapam.c:1220
#9 0x0000561e5674ff5a in table_beginscan (rel=0x7f3d6cc9d4e8,
snapshot=0x561e579c4da0, nkeys=0, key=0x0) at
../../../src/include/access/tableam.h:891
#10 0x0000561e56750fa8 in DefineQueryRewrite (rulename=0x561e57991660
"_RETURN", event_relid=40960, event_qual=0x0, event_type=CMD_SELECT,
is_instead=true, replace=false, action=0x561e57a60648)
at rewriteDefine.c:447
#11 0x0000561e567505cc in DefineRule (stmt=0x561e57991d68,
queryString=0x561e57990c38 "CREATE RULE \"_RETURN\" AS ON SELECT TO t
DO INSTEAD\n SELECT * FROM generate_series(1,5) t0(c);") at
rewriteDefine.c:213
#12 0x0000561e567d157a in ProcessUtilitySlow (pstate=0x561e579bae18,
pstmt=0x561e579920a8,
queryString=0x561e57990c38 "CREATE RULE \"_RETURN\" AS ON SELECT
TO t DO INSTEAD\n SELECT * FROM generate_series(1,5) t0(c);",
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
dest=0x561e57992188, qc=0x7ffcf2482ea0) at utility.c:1657

5) Where clause should come before instead:
5.a) Where clause should come before instead:
+ append_string_object(ruleStmt, "DO %{instead}s",
+ node->instead ?
"INSTEAD" : "ALSO");
+
+ ev_qual = heap_getattr(rewrTup, Anum_pg_rewrite_ev_qual,
+
RelationGetDescr(pg_rewrite), &isnull);
+ ev_actions = heap_getattr(rewrTup, Anum_pg_rewrite_ev_action,
+
RelationGetDescr(pg_rewrite), &isnull);
+
+ pg_get_ruledef_detailed(ev_qual, ev_actions, &qual, &actions);
+
+ tmp = new_objtree_VA("WHERE %{clause}s", 0);
+
+ if (qual)
+ append_string_object(tmp, "clause", qual);
+ else
+ {
+ append_null_object(tmp, "clause");
+ append_bool_object(tmp, "present", false);
+ }
+
+ append_object_object(ruleStmt, "where_clause", tmp);

5.b) clause should be changed to %{clause}s in both places
It can be changed to:
.....
ev_qual = heap_getattr(rewrTup, Anum_pg_rewrite_ev_qual,
RelationGetDescr(pg_rewrite), &isnull);
ev_actions = heap_getattr(rewrTup, Anum_pg_rewrite_ev_action,
RelationGetDescr(pg_rewrite), &isnull);
pg_get_ruledef_detailed(ev_qual, ev_actions, &qual, &actions);
tmp = new_objtree_VA("WHERE", 0);
if (qual)
append_string_object(tmp, "%{clause}s", qual);
else
{
append_null_object(tmp, "%{clause}s");
append_bool_object(tmp, "present", false);
}

append_object_object(ruleStmt, "%{where_clause}s", tmp);

append_string_object(ruleStmt, "DO %{instead}s",
node->instead ? "INSTEAD" : "ALSO");
.....

CREATE RULE qqq AS ON INSERT TO public.copydml_test DO INSTEAD WHERE
(new.t OPERATOR(pg_catalog.<>) 'f'::pg_catalog.text) DELETE FROM
public.copydml_test

6) Rename table constraint not handled:
+/*
+ * Deparse a RenameStmt.
+ */
+static ObjTree *
+deparse_RenameStmt(ObjectAddress address, Node *parsetree)
+{
+ RenameStmt *node = (RenameStmt *) parsetree;
+ ObjTree *renameStmt;
+ char *fmtstr;
+ const char *objtype;
+ Relation relation;
+ Oid schemaId;
+

ALTER TABLE onek ADD CONSTRAINT onek_check_constraint CHECK (unique1 >= 0);
ALTER TABLE onek RENAME CONSTRAINT onek_check_constraint TO
onek_check_constraint_foo;

7) The following deparsing of index fails:
CREATE TABLE covering_index_heap (f1 int, f2 int, f3 text);
CREATE UNIQUE INDEX covering_index_index on covering_index_heap
(f1,f2) INCLUDE(f3);

8) default should be %{default}s
+deparse_CreateConversion(Oid objectId, Node *parsetree)
+{
+ HeapTuple conTup;
+ Relation convrel;
+ Form_pg_conversion conForm;
+ ObjTree *ccStmt;
+ ObjTree *tmpObj;
+
+ convrel = table_open(ConversionRelationId, AccessShareLock);
+ conTup = get_catalog_object_by_oid(convrel,
Anum_pg_conversion_oid, objectId);
+ if (!HeapTupleIsValid(conTup))
+ elog(ERROR, "cache lookup failed for conversion with
OID %u", objectId);
+ conForm = (Form_pg_conversion) GETSTRUCT(conTup);
+
+ /*
+ * Verbose syntax
+ *
+ * CREATE %{default}s CONVERSION %{identity}D FOR %{source}L TO %{dest}L
+ * FROM %{function}D
+ */
+ ccStmt = new_objtree("CREATE");
+
+
+ /* Add the DEFAULT clause */
+ append_string_object(ccStmt, "default",
+ conForm->condefault ?
"DEFAULT" : "");

9) Rename of Domain constraint not handled:
+/*
+ * Deparse a RenameStmt.
+ */
+static ObjTree *
+deparse_RenameStmt(ObjectAddress address, Node *parsetree)
+{
+ RenameStmt *node = (RenameStmt *) parsetree;
+ ObjTree *renameStmt;
+ char *fmtstr;
+ const char *objtype;
+ Relation relation;
+ Oid schemaId;
+

Regards,
Vignesh

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-11-03 07:40:59
Message-ID:	CAHut+Puxo_kq2toicNK_BQdeccK3REGW-Xv8tVauFvTNku6V-w@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Here are some more review comments for the v32-0001 file ddl_deparse.c

(This is a WIP since it is such a large file)

======

1. General - calling VA with 0 args

There are some calls to new_objtree_VA() where 0 extra args are passed.

e.g. See in deparse_AlterFunction
* alterFunc = new_objtree_VA("ALTER FUNCTION", 0);
* ObjTree *tmpobj = new_objtree_VA("%{type}T", 0);
* tmpobj = new_objtree_VA(intVal(defel->arg) ? "RETURNS NULL ON NULL
INPUT" : "CALLED ON NULL INPUT", 0);
* tmpobj = new_objtree_VA(intVal(defel->arg) ? "SECURITY DEFINER" :
"SECURITY INVOKER", 0);
* tmpobj = new_objtree_VA(intVal(defel->arg) ? "LEAKPROOF" : "NOT
LEAKPROOF", 0);
* etc.

Shouldn't all those just be calling the new_objtree() function instead
of new_objtree_VA()?

Probably there are more than just those cited - please search for others.

~~~

2. General - when to call append_xxx functions?

I did not always understand what seems like an arbitrary choice of
function calls to append_xxx.

e.g. Function deparse_AlterObjectSchemaStmt() does:

+ append_string_object(alterStmt, "%{identity}s", ident);
+
+ append_string_object(alterStmt, "SET SCHEMA %{newschema}I", newschema);

Why doesn't the above just use new_objtree_VA instead -- it seemed to
me like the _VA function is underused in some places. Maybe search all
the append_xxx usage - I suspect many of those can in fact be combined
to use new_objtree_VA().

~~~

3. General - extract object names from formats

IIUC the calls to append_XXX__object will call deeper to
append_object_to_format_string(), which has a main side-effect loop to
extract the "name" part out of the sub_fmt string. But this logic all
seems inefficient and unnecessary to me. I think in most (all?) cases
the caller already knows what the object name should be, so instead of
making code work to figure it out again, it can just be passed in the
same way the _VA() function passes the known object name.

There are many cases of this:

e.g.

BEFORE
append_string_object(alterop, "(%{left_type}s", "NONE");

AFTER - just change the signature to pass the known object name
append_string_object(alterop, "(%{left_type}s", "left_type", "NONE");

~~~

4. General - fwd declares

static void append_array_object(ObjTree *tree, char *name, List *array);
static void append_bool_object(ObjTree *tree, char *name, bool value);
static void append_float_object(ObjTree *tree, char *name, float8 value);
static void append_null_object(ObjTree *tree, char *name);
static void append_object_object(ObjTree *tree, char *name, ObjTree *value);
static char *append_object_to_format_string(ObjTree *tree, char *sub_fmt);
static void append_premade_object(ObjTree *tree, ObjElem *elem);
static void append_string_object(ObjTree *tree, char *name, char *value);

I think those signatures are misleading. AFAIK seems what is called
the 'name' param above is often a 'sub_fmt' param in the
implementation.

~~~

5. General - inconsistent append_array object calls.

Sometimes enclosing brackets are included as part of the format string
to be appended and other times they are appended separately. IIUC
there is no difference but IMO the code should always be consistent to
avoid it being confusing.

e.g.1 (brackets in fmt)
append_array_object(tmpobj, "(%{rettypes:, }s)", rettypes);

e.g.2 (brackets appended separately)
+ append_format_string(tmpobj, "(");
+ append_array_object(tmpobj, "%{argtypes:, }T", arglist);
+ append_format_string(tmpobj, ")");

~~~

6. General - missing space before '('

I noticed a number of comment where there is a space missing before a '('.
Here are some examples:

- * An element of an object tree(ObjTree).
- * typmod at the middle of name(e.g. TIME(6) with time zone ). We cannot
- * Sequence for IDENTITY COLUMN output separately(via CREATE TABLE or
- * Sequence for IDENTITY COLUMN output separately(via CREATE TABLE or

Search all the patch-code to find others and add missing spaces.

~~~

7. General - Verbose syntax comments

Some (but not all) of the deparse_XXX functions have a comment
describing the verbose syntax.

e.g.
/*
* Verbose syntax
*
* CREATE %{default}s CONVERSION %{identity}D FOR %{source}L TO %{dest}L
* FROM %{function}D
*/

These are helpful for understanding the logic of the function, so IMO
similar comments should be written for *all* of the deparse_xxx
function.

And maybe a more appropriate place to put these comments is in the
function header comment.

======

8. new_objtree_VA

+ /*
+ * For all other param types there must be a value in the varargs.
+ * Fetch it and add the fully formed subobject into the main object.
+ */
+ switch (type)

What does the comment mean when it says - for all "other" param types?

~~~

9. objtree_to_jsonb_element

+ ListCell *cell;
+ JsonbValue val;

The 'cell' is only for the ObjTypeArray so consider declaring it for
that case only.

~~~

10. obtainConstraints

+ else
+ {
+ Assert(OidIsValid(domainId));

Looks like the above Assert is unnecessary because the earlier Assert
(below) already ensures this:
+ /* Only one may be valid */
+ Assert(OidIsValid(relationId) ^ OidIsValid(domainId));

~~~

11. pg_get_indexdef_detailed

+ /* Output tablespace */
+ {
+ Oid tblspc;
+
+ tblspc = get_rel_tablespace(indexrelid);
+ if (OidIsValid(tblspc))
+ *tablespace = pstrdup(quote_identifier(get_tablespace_name(tblspc)));
+ else
+ *tablespace = NULL;
+ }
+
+ /* Report index predicate, if any */
+ if (!heap_attisnull(ht_idx, Anum_pg_index_indpred, NULL))
+ {
+ Node *node;
+ Datum predDatum;
+ char *predString;
+
+ /* Convert text string to node tree */
+ predDatum = SysCacheGetAttr(INDEXRELID, ht_idx,
+ Anum_pg_index_indpred, &isnull);
+ Assert(!isnull);
+ predString = TextDatumGetCString(predDatum);
+ node = (Node *) stringToNode(predString);
+ pfree(predString);
+
+ /* Deparse */
+ *whereClause =
+ deparse_expression(node, context, false, false);
+ }
+ else
+ *whereClause = NULL;

Maybe just assign defaults:
*tablespace = NULL;
*whereClause = NULL;

then overwrite those defaults, so can avoid the 'else' code.

~~~

12. stringify_objtype

+/*
+ * Return the given object type as a string.
+ */
+static const char *
+stringify_objtype(ObjectType objtype)

12a.
This statics function feels like it belongs more in another module as
a utility function.

12b.
Actually, this function looks like it might be more appropriate just
as a static lookup array/map of names keys by the ObjectType, and
using a StaticAssertDecl for sanity checking.

~~~

13. deparse_GrantStmt

+ /*
+ * If there are no objects from "ALL ... IN SCHEMA" to be granted, then we
+ * need not do anything.
+ */
+ if (istmt->objects == NIL)
+ return NULL;

"we need not do anything." -> "nothing to do."

~~~

14. deparse_GrantStmt

+ switch (istmt->objtype)
+ {
+ case OBJECT_COLUMN:
+ case OBJECT_TABLE:
+ objtype = "TABLE";
+ classId = RelationRelationId;
+ break;
+ case OBJECT_SEQUENCE:
+ objtype = "SEQUENCE";
+ classId = RelationRelationId;
+ break;
+ case OBJECT_DOMAIN:
+ objtype = "DOMAIN";
+ classId = TypeRelationId;
+ break;
+ case OBJECT_FDW:
+ objtype = "FOREIGN DATA WRAPPER";
+ classId = ForeignDataWrapperRelationId;
+ break;
+ case OBJECT_FOREIGN_SERVER:
+ objtype = "FOREIGN SERVER";
+ classId = ForeignServerRelationId;
+ break;
+ case OBJECT_FUNCTION:
+ objtype = "FUNCTION";
+ classId = ProcedureRelationId;
+ break;
+ case OBJECT_PROCEDURE:
+ objtype = "PROCEDURE";
+ classId = ProcedureRelationId;
+ break;
+ case OBJECT_ROUTINE:
+ objtype = "ROUTINE";
+ classId = ProcedureRelationId;
+ break;
+ case OBJECT_LANGUAGE:
+ objtype = "LANGUAGE";
+ classId = LanguageRelationId;
+ break;
+ case OBJECT_LARGEOBJECT:
+ objtype = "LARGE OBJECT";
+ classId = LargeObjectRelationId;
+ break;
+ case OBJECT_SCHEMA:
+ objtype = "SCHEMA";
+ classId = NamespaceRelationId;
+ break;
+ case OBJECT_TYPE:
+ objtype = "TYPE";
+ classId = TypeRelationId;
+ break;
+ case OBJECT_DATABASE:
+ case OBJECT_TABLESPACE:
+ objtype = "";
+ classId = InvalidOid;
+ elog(ERROR, "global objects not supported");
+ break;
+ default:
+ elog(ERROR, "invalid OBJECT value %d", istmt->objtype);
+ }

Shouldn't code be calling to the other function stringify_objtype() to
do some of this?

~~~

15.

+ grantStmt = new_objtree_VA(fmt, 0);
+
+ /* build a list of privileges to grant/revoke */
+ if (istmt->all_privs)
+ {
+ tmp = new_objtree_VA("ALL PRIVILEGES", 0);

Here are some more examples of the _VA function being called with 0
args. Why use _VA function?

~~~

16.

+ list = NIL;
+
+ if (istmt->privileges & ACL_INSERT)
+ list = lappend(list, new_string_object("INSERT"));
+ if (istmt->privileges & ACL_SELECT)
+ list = lappend(list, new_string_object("SELECT"));
+ if (istmt->privileges & ACL_UPDATE)
+ list = lappend(list, new_string_object("UPDATE"));
+ if (istmt->privileges & ACL_DELETE)
+ list = lappend(list, new_string_object("DELETE"));
+ if (istmt->privileges & ACL_TRUNCATE)
+ list = lappend(list, new_string_object("TRUNCATE"));
+ if (istmt->privileges & ACL_REFERENCES)
+ list = lappend(list, new_string_object("REFERENCES"));
+ if (istmt->privileges & ACL_TRIGGER)
+ list = lappend(list, new_string_object("TRIGGER"));
+ if (istmt->privileges & ACL_EXECUTE)
+ list = lappend(list, new_string_object("EXECUTE"));
+ if (istmt->privileges & ACL_USAGE)
+ list = lappend(list, new_string_object("USAGE"));
+ if (istmt->privileges & ACL_CREATE)
+ list = lappend(list, new_string_object("CREATE"));
+ if (istmt->privileges & ACL_CREATE_TEMP)
+ list = lappend(list, new_string_object("TEMPORARY"));
+ if (istmt->privileges & ACL_CONNECT)
+ list = lappend(list, new_string_object("CONNECT"));

16a.
Shouldn't this be trying to re-use code like privilege_to_string()
mapping function already in aclchk.c to get all those ACL strings?

16b.
Is it correct that ACL_SET and ACL_ALTER_SYSTEM are missing?

~~~

17.

The coding style is inconsistent in this function...

For the same things - sometimes use the ternary operator; sometimes use if/else.

e.g.1
+ append_string_object(grantStmt, "%{grant_option}s",
+ istmt->grant_option ? "WITH GRANT OPTION" : "");

e.g.2
+ if (istmt->behavior == DROP_CASCADE)
+ append_string_object(grantStmt, "%{cascade}s", "CASCADE");
+ else
+ append_string_object(grantStmt, "%{cascade}s", "");

~~~

18. deparse_AlterOpFamily

+ tmpobj2 = new_objtree_VA("FOR ORDER BY", 0);
+ append_object_object(tmpobj2, "%{opfamily}D",
+ new_objtree_for_qualname_id(OperatorFamilyRelationId,
+ oper->sortfamily));

Why isn't something like this combined to be written as a signle
new_objtree_VA call?

~~~

19. deparse_Type_Storage

+ tmpstr = psprintf("%s", str);
+
+ fmt = "STORAGE = %{value}s";
+
+ storage = new_objtree_VA(fmt, 2,
+ "clause", ObjTypeString, "storage",
+ "value", ObjTypeString, tmpstr);

19a.
What is the purpose of tmpstr? Seems unnecessary

19b.
What is the purpose of separate 'fmt' var? Why not just pass format
string as a parameter literal to the new_objtree_VA()

~~~

20. deparse_CreateConversion

+ /* Add the DEFAULT clause */
+ append_string_object(ccStmt, "default",
+ conForm->condefault ? "DEFAULT" : "");

20a.
Is that code correct? I thought the fmt should look like
"%{default}s", otherwise won't the resulting string object have no
name?

20b.
Anyway, it does not seem to match what the preceding verbose syntax
comment says.

~~~

21.

+
+
+ /* Add the DEFAULT clause */
+ append_string_object(ccStmt, "default",
+ conForm->condefault ? "DEFAULT" : "");
+
+ tmpObj = new_objtree_for_qualname(conForm->connamespace,
NameStr(conForm->conname));
+ append_object_object(ccStmt, "CONVERSION %{identity}D", tmpObj);
+ append_string_object(ccStmt, "FOR %{source}L", (char *)
+ pg_encoding_to_char(conForm->conforencoding));
+ append_string_object(ccStmt, "TO %{dest}L", (char *)
+ pg_encoding_to_char(conForm->contoencoding));
+ append_object_object(ccStmt, "FROM %{function}D",
+ new_objtree_for_qualname_id(ProcedureRelationId,
+ conForm->conproc));

I don't really understand why all this is not written instead using a
single new_objtree_VA() call.

~~~

22. deparse_CreateEnumStmt

+ enumtype = new_objtree("CREATE TYPE");
+ append_object_object(enumtype, "%{identity}D",
+ new_objtree_for_qualname_id(TypeRelationId,
+ objectId));
+
+ values = NIL;
+ foreach(cell, node->vals)
+ {
+ String *val = lfirst_node(String, cell);
+
+ values = lappend(values, new_string_object(strVal(val)));
+ }
+
+ append_array_object(enumtype, "AS ENUM (%{values:, }L)", values);
+ return enumtype;

Ditto. I don't really understand why all this is not written instead
using a single new_objtree_VA() call.

~~~

23. deparse_CreateExtensionStmt

+ extStmt = new_objtree("CREATE EXTENSION");
+
+ append_string_object(extStmt, "%{if_not_exists}s",
+ node->if_not_exists ? "IF NOT EXISTS" : "");
+
+ append_string_object(extStmt, "%{name}I", node->extname);

Ditto. I don't really understand why all this is not written instead
using a single new_objtree_VA() call.

~~~

24. deparse_FdwOptions

+ tmp = new_objtree_VA("OPTIONS", 0);

Isn't it better to call other function instead of passing zero params
to this one?

~~~

25. deparse_CreateFdwStmt

25a.
+ /* add HANDLER clause */
+ if (fdwForm->fdwhandler == InvalidOid)
+ tmp = new_objtree_VA("NO HANDLER", 0);
+ else

Isn't it better to call other function instead of passing zero params
to this one?

25b.
+ /* add VALIDATOR clause */
+ if (fdwForm->fdwvalidator == InvalidOid)
+ tmp = new_objtree_VA("NO VALIDATOR", 0);

Ditto #25a

25c.
Both above should use OidIsValid macro.

~~~

26. deparse_AlterFdwStmt

26a.
+ /* add HANDLER clause */
+ if (fdwForm->fdwhandler == InvalidOid)
+ tmp = new_objtree_VA("NO HANDLER", 0);

Ditto #25a

26b.
+ /* add VALIDATOR clause */
+ if (fdwForm->fdwvalidator == InvalidOid)
+ tmp = new_objtree_VA("NO VALIDATOR", 0);

Ditto #25a

26c.
Both above should use OidIsValid macro.

~~~

27. deparse_CreateRangeStmt

+ /* SUBTYPE */
+ tmp = new_objtree_for_qualname_id(TypeRelationId,
+ rangeForm->rngsubtype);
+ tmp = new_objtree_VA("SUBTYPE = %{type}D",
+ 2,
+ "clause", ObjTypeString, "subtype",
+ "type", ObjTypeObject, tmp);
+ definition = lappend(definition, new_object_object(tmp));

The reusing of 'tmp' variable seems a bit sneaky to me. Perhaps using
'tmp' and 'tmp_qualid' might be a more readable way to go here.

~~~

28. deparse_AlterEnumStmt

+ if (node->newValNeighbor)
+ {
+ append_string_object(alterEnum, "%{after_or_before}s",
+ node->newValIsAfter ? "AFTER" : "BEFORE");
+ append_string_object(alterEnum, "%{neighbour}L", node->newValNeighbor);
+ }

Has a mix of US and UK spelling of neighbor/neighbour?

------
Kind Regards,
Peter Smith.
Fujitsu Australia

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-11-04 09:36:48
Message-ID:	CALDaNm2MDTFwWGG6xph8biwQf-ePfo7BLcXTU+j8s9aS0PdtBQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

We could check relation is NULL after getting address and skip
processing that object

2) Materialized view handling is missing:
+ switch (rel->rd_rel->relkind)
+ {
+ case RELKIND_RELATION:
+ case RELKIND_PARTITIONED_TABLE:
+ reltype = "TABLE";
+ break;
+ case RELKIND_INDEX:
+ case RELKIND_PARTITIONED_INDEX:
+ reltype = "INDEX";
+ break;
+ case RELKIND_VIEW:
+ reltype = "VIEW";
+ break;
+ case RELKIND_COMPOSITE_TYPE:
+ reltype = "TYPE";
+ istype = true;
+ break;

We could use this scenario for debugging and verifying:
ALTER MATERIALIZED VIEW testschema.amv SET TABLESPACE regress_tblspace;

3) Readdition of alter table readd statistics is not handled:

+ case AT_DropIdentity:
+ tmpobj = new_objtree_VA("ALTER COLUMN
%{column}I DROP IDENTITY", 2,
+
"type", ObjTypeString, "drop identity",
+
"column", ObjTypeString, subcmd->name);
+
+ append_string_object(tmpobj,
"%{if_not_exists}s",
+
subcmd->missing_ok ? "IF EXISTS" : "");
+
+ subcmds = lappend(subcmds,
new_object_object(tmpobj));
+ break;
+ default:
+ elog(WARNING, "unsupported alter table
subtype %d",
+ subcmd->subtype);
+ break;
+ }

We could use this scenario for debugging and verifying:
CREATE TABLE functional_dependencies (
filler1 TEXT,
filler2 NUMERIC,
a INT,
b TEXT,
filler3 DATE,
c INT,
d TEXT
)
WITH (autovacuum_enabled = off);
CREATE STATISTICS func_deps_stat (dependencies) ON a, b, c FROM
functional_dependencies;
TRUNCATE functional_dependencies;
ALTER TABLE functional_dependencies ALTER COLUMN c TYPE numeric;

4) "Alter sequence as" option not hanlded

+ if (strcmp(elem->defname, "cache") == 0)
+ newelm = deparse_Seq_Cache(alterSeq, seqform, false);
+ else if (strcmp(elem->defname, "cycle") == 0)
+ newelm = deparse_Seq_Cycle(alterSeq, seqform, false);
+ else if (strcmp(elem->defname, "increment") == 0)
+ newelm = deparse_Seq_IncrementBy(alterSeq,
seqform, false);
+ else if (strcmp(elem->defname, "minvalue") == 0)
+ newelm = deparse_Seq_Minvalue(alterSeq, seqform, false);
+ else if (strcmp(elem->defname, "maxvalue") == 0)
+ newelm = deparse_Seq_Maxvalue(alterSeq, seqform, false);
+ else if (strcmp(elem->defname, "start") == 0)
+ newelm = deparse_Seq_Startwith(alterSeq,
seqform, false);
+ else if (strcmp(elem->defname, "restart") == 0)
+ newelm = deparse_Seq_Restart(alterSeq, seqdata);
+ else if (strcmp(elem->defname, "owned_by") == 0)
+ newelm = deparse_Seq_OwnedBy(alterSeq, objectId, false);
+ else
+ elog(ERROR, "invalid sequence option %s",
elem->defname);

We could use this scenario for debugging and verifying:
ALTER SEQUENCE seq1 AS smallint;

5) alter table row level security is not handled:

We could use this scenario for debugging and verifying:
CREATE TABLE r1 (a int);
ALTER TABLE r1 FORCE ROW LEVEL SECURITY;

6) alter table add primary key is not handled:

We could use this scenario for debugging and verifying:
create table idxpart (a int) partition by range (a);
create table idxpart0 (like idxpart);
alter table idxpart0 add primary key (a);
alter table idxpart attach partition idxpart0 for values from (0) to (1000);
alter table only idxpart add primary key (a);

7) Code not updated based on new change:

7.a) identity_column should be removed from new_objtree_VA
+ case AT_AddIdentity:
+ {
+ AttrNumber attnum;
+ Oid seq_relid;
+ ObjTree *seqdef;
+ ColumnDef *coldef =
(ColumnDef *) subcmd->def;
+
+ tmpobj = new_objtree_VA("ALTER
COLUMN %{column}I ADD %{identity_column}s", 2,
+
"type", ObjTypeString, "add identity",
+
"column", ObjTypeString, subcmd->name);
+
+ attnum =
get_attnum(RelationGetRelid(rel), subcmd->name);
+ seq_relid =
getIdentitySequence(RelationGetRelid(rel), attnum, true);
+ seqdef =
deparse_ColumnIdentity(seq_relid, coldef->identity, false);
+
+ append_object_object(tmpobj,
"identity_column", seqdef);

7.b) identity_column should be changed to "%{identity_column}s" in
append_object_object

We could use this scenario for debugging and verifying:
CREATE TABLE itest4 (a int NOT NULL, b text);
ALTER TABLE itest4 ALTER COLUMN a ADD GENERATED ALWAYS AS IDENTITY;

8) SearchSysCache1 copied twice, one of it should be removed
+ /*
+ * Lookup up object in the catalog, so we don't have to deal with
+ * current_user and such.
+ */
+
+ tp = SearchSysCache1(USERMAPPINGOID, ObjectIdGetDatum(objectId));
+ if (!HeapTupleIsValid(tp))
+ elog(ERROR, "cache lookup failed for user mapping %u",
objectId);
+
+ form = (Form_pg_user_mapping) GETSTRUCT(tp);
+
+ /*
+ * Lookup up object in the catalog, so we don't have to deal with
+ * current_user and such.
+ */
+
+ tp = SearchSysCache1(USERMAPPINGOID, ObjectIdGetDatum(objectId));
+ if (!HeapTupleIsValid(tp))
+ elog(ERROR, "cache lookup failed for user mapping %u",
objectId);

9) Create table with INCLUDING GENERATED not handled:
+ case AT_DropIdentity:
+ tmpobj = new_objtree_VA("ALTER COLUMN
%{column}I DROP IDENTITY", 2,
+
"type", ObjTypeString, "drop identity",
+
"column", ObjTypeString, subcmd->name);
+
+ append_string_object(tmpobj,
"%{if_not_exists}s",
+
subcmd->missing_ok ? "IF EXISTS" : "");
+
+ subcmds = lappend(subcmds,
new_object_object(tmpobj));
+ break;
+ default:
+ elog(WARNING, "unsupported alter table
subtype %d",
+ subcmd->subtype);
+ break;

We could use this scenario for debugging and verifying:
CREATE TABLE gtest28a (a int, b int, c int, x int GENERATED ALWAYS
AS (b * 2) STORED);
CREATE TABLE gtest28b (LIKE gtest28a INCLUDING GENERATED);

Regards,
Vignesh

From:	Ajin Cherian <itsajin(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-11-08 04:03:34
Message-ID:	CAFPTHDYMs9zRUHkBZiCSVRBuC7UOg8yn11vd8QH_FsCfUnUZcw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Thu, 27 Oct 2022 at 16:02, vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> 2) This function should handle "alter procedure" too:
> +/*
> + * Deparse an AlterFunctionStmt (ALTER FUNCTION/ROUTINE)
> + *
> + * Given a function OID and the parse tree that created it, return the JSON
> + * blob representing the alter command.
> + */
> +static ObjTree *
> +deparse_AlterFunction(Oid objectId, Node *parsetree)
> +{
> + AlterFunctionStmt *node = (AlterFunctionStmt *) parsetree;
> + ObjTree *alterFunc;
> + ObjTree *sign;
> + HeapTuple procTup;
>
> Currently "alter procedure" statement are replicated as "alter
> function" statements in the subscriber.

Fixed this.

> 3) In few of the extensions we execute "alter operator family" like in
> hstore extension, we should exclude replicating "alter operator
> family" when create extension is in progress:
> /* Don't deparse SQL commands generated while creating extension */
> if (cmd->in_extension)
> return NULL;
>
> The above check should be included in the below code, else the create
> extension statment will fail as internal statements will be executed:
>
> +static ObjTree *
> +deparse_AlterOpFamily(CollectedCommand *cmd)
> +{
> + ObjTree *alterOpFam;
> + AlterOpFamilyStmt *stmt = (AlterOpFamilyStmt *) cmd->parsetree;
> + HeapTuple ftp;
> + Form_pg_opfamily opfForm;
> + List *list;
> + ListCell *cell;
> +
> + ftp = SearchSysCache1(OPFAMILYOID,
> +
> ObjectIdGetDatum(cmd->d.opfam.address.objectId));
> + if (!HeapTupleIsValid(ftp))
> + elog(ERROR, "cache lookup failed for operator family %u",
> + cmd->d.opfam.address.objectId);
> + opfForm = (Form_pg_opfamily) GETSTRUCT(ftp);
> +

Fixed this.

> 4) This if...else can be removed, the nspid and typname can be handled
> for others in default. *nspid can be set to InvalidOid at the
> beginning.
> + if (type_oid == INTERVALOID ||
> + type_oid == TIMESTAMPOID ||
> + type_oid == TIMESTAMPTZOID ||
> + type_oid == TIMEOID ||
> + type_oid == TIMETZOID)
> + {
> + switch (type_oid)
> + {
> + case INTERVALOID:
> + *typname = pstrdup("INTERVAL");
> + break;
> + case TIMESTAMPTZOID:
> + if (typemod < 0)
> + *typname = pstrdup("TIMESTAMP
> WITH TIME ZONE");
> + else
> + /* otherwise, WITH TZ is added
> by typmod. */
> + *typname = pstrdup("TIMESTAMP");
> + break;
> + case TIMESTAMPOID:
> + *typname = pstrdup("TIMESTAMP");
> + break;
> + case TIMETZOID:
> + if (typemod < 0)
> + *typname = pstrdup("TIME WITH
> TIME ZONE");
> + else
> + /* otherwise, WITH TZ is added
> by typmod. */
> + *typname = pstrdup("TIME");
> + break;
> + case TIMEOID:
> + *typname = pstrdup("TIME");
> + break;
> + }
> + *nspid = InvalidOid;
> + }
> + else
> + {
> + /*
> + * No additional processing is required for other
> types, so get the
> + * type name and schema directly from the catalog.
> + */
> + *nspid = typeform->typnamespace;
> + *typname = pstrdup(NameStr(typeform->typname));
> + }
>

Changed this.

> 5) The following includes are not required in ddl_deparse.c:
> #include "catalog/pg_attribute.h"
> #include "catalog/pg_class.h"
> #include "lib/ilist.h"
> #include "nodes/makefuncs.h"
> #include "nodes/parsenodes.h"
> #include "utils/memutils.h"
>

Fixed this.

> 6) Inconsistent error reporting:
> In few places elog is used and in few places ereport is used:
> + HeapTuple polTup = get_catalog_object_by_oid(polRel,
> Anum_pg_policy_oid, policyOid);
> + Form_pg_policy polForm;
> +
> + if (!HeapTupleIsValid(polTup))
> + elog(ERROR, "cache lookup failed for policy %u", policyOid);
>
>
> + char *rolename;
> +
> + roltup = SearchSysCache1(AUTHOID, ObjectIdGetDatum(roleoid));
> + if (!HeapTupleIsValid(roltup))
> + ereport(ERROR,
> + (errcode(ERRCODE_UNDEFINED_OBJECT),
> + errmsg("role with OID %u does
> not exist", roleoid)));
>
> We can try to use the same style of error reporting.
>

Changed all errors to elog

> 8) Inclusion ordering in ddl_deparse.c:
> 8.a) The following should be slightly reordered
> +#include "access/amapi.h"
> +#include "access/table.h"
> +#include "access/relation.h"
>
> 8.b) The following should be slightly reordered
> +#include "postgres.h"
> +#include "tcop/ddl_deparse.h"
> +#include "access/amapi.h"
>
> 9) In few places multi line comment can be changed to single line comment:
> 9.a)
> + /*
> + * Fetch the pg_class tuple of the index relation
> + */
>
> 9.b)
> + /*
> + * Fetch the pg_am tuple of the index' access method
> + */
>
> 9.c)
> + /*
> + * Reject unsupported case right away.
> + */
>
> 10) This should also specify ROUTINE in the comment
> /*
> * Verbose syntax
> *
> * ALTER FUNCTION %{signature}s %{definition: }s
> */
> alterFunc = new_objtree_VA(node->objtype == OBJECT_ROUTINE ?
> "ALTER ROUTINE" : "ALTER FUNCTION", 0);
>
> 11) This can be changed in alphabetical order(collation first and then column):
> 11.a)
> + case OBJECT_COLUMN:
> + return "COLUMN";
> + case OBJECT_COLLATION:
> + return "COLLATION";
> + case OBJECT_CONVERSION:
> + return "CONVERSION";
>
> 11.b) similarly here:
> case OBJECT_FDW:
> return "FOREIGN DATA WRAPPER";
> case OBJECT_FOREIGN_SERVER:
> return "SERVER";
> case OBJECT_FOREIGN_TABLE:
> return "FOREIGN TABLE";
>
> 11.c) similarly here:
> case OBJECT_FUNCTION:
> return "FUNCTION";
> case OBJECT_ROUTINE:
> return "ROUTINE";
> case OBJECT_INDEX:
> return "INDEX";
>
> 11.d) similarly here:
> case OBJECT_OPCLASS:
> return "OPERATOR CLASS";
> case OBJECT_OPERATOR:
> return "OPERATOR";
> case OBJECT_OPFAMILY:
> return "OPERATOR FAMILY";
>
> 11.e) similarly here:
> case OBJECT_TRIGGER:
> return "TRIGGER";
> case OBJECT_TSCONFIGURATION:
> return "TEXT SEARCH CONFIGURATION";
>
> /*
> * case OBJECT_TSCONFIG_MAPPING:
> * return "TEXT SEARCH CONFIGURATION MAPPING";
> */
> case OBJECT_TSDICTIONARY:
> return "TEXT SEARCH DICTIONARY";
> case OBJECT_TSPARSER:
> return "TEXT SEARCH PARSER";
> case OBJECT_TSTEMPLATE:
> return "TEXT SEARCH TEMPLATE";
> case OBJECT_TYPE:
> return "TYPE";
>

Fixed this.

> 12) new_objtree can be used instead of new_objtree_VA when there is no
> arguments, one additional check can be avoided
>
> 12.a) alterFunc = new_objtree_VA(node->objtype == OBJECT_ROUTINE ?
> "ALTER ROUTINE" : "ALTER FUNCTION", 0);
>
> 12.b) ObjTree *tmpobj = new_objtree_VA("", 0);
>
> 12.c) tmpobj = new_objtree_VA(strVal(defel->arg), 0);
>
> 12.d) tmpobj = new_objtree_VA("ROWS", 0);
>
> 12.e) grantStmt = new_objtree_VA(fmt, 0);
>
> 12.f) tmp = new_objtree_VA("ALL PRIVILEGES", 0);
>
> 12.g) tmpobj2 = new_objtree_VA("FOR ORDER BY", 0);
>
> 12.h) composite = new_objtree_VA("CREATE TYPE", 0);
>
> 12.i) tmp = new_objtree_VA("OPTIONS", 0);
>
> 12.j) tmp = new_objtree_VA("NO HANDLER", 0);
>
> 12.k) .... similarly in few more places .....
>

Fixed these.

On Thu, Nov 3, 2022 at 1:51 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> Few comments:
> 1) Empty () should be appended in case if there are no table elements:
> + tableelts = deparse_TableElements(relation,
> node->tableElts, dpcontext,
> +
> false, /* not typed table */
> +
> false); /* not composite */
> + tableelts = obtainConstraints(tableelts, objectId, InvalidOid);
> +
> + append_array_object(createStmt, "(%{table_elements:,
> }s)", tableelts);
>
> This is required for:
> CREATE TABLE ihighway () INHERITS (road);
>

Fixed this.

> 2.a)
> Here cell2 will be of type RoleSpec, the below should be changed:
> + foreach(cell2, (List *) opt->arg)
> + {
> + String *val = lfirst_node(String, cell2);
> + ObjTree *obj =
> new_objtree_for_role(strVal(val));
> +
> + roles = lappend(roles, new_object_object(obj));
> + }
>
> to:
> foreach(cell2, (List *) opt->arg)
> {
> RoleSpec *rolespec = lfirst(cell2);
> ObjTree *obj = new_objtree_for_rolespec(rolespec);
>
> roles = lappend(roles, new_object_object(obj));
> }
>

Fixed this.

> This change is required for:
> ALTER DEFAULT PRIVILEGES FOR ROLE regress_selinto_user REVOKE INSERT
> ON TABLES FROM regress_selinto_user;
>
> 2.b) After the above change the following function cna be removed:
> +/*
> + * Helper routine for %{}R objects, with role specified by name.
> + */
> +static ObjTree *
> +new_objtree_for_role(char *rolename)
> +{
> + ObjTree *role;
> +
> + role = new_objtree_VA(NULL,2,
> + "is_public",
> ObjTypeBool, strcmp(rolename, "public") == 0,
> + "rolename",
> ObjTypeString, rolename);
> + return role;
> +}
>

Fixed this.

I've addressed a few comments from Vignesh, there are quite a few more
comments remaining. I will update them in my next patch.

regards,
Ajin Cherian
Fujitsu Australia

Attachment	Content-Type	Size
v34-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	application/octet-stream	15.0 KB
v34-0004-Test-cases-for-DDL-replication.patch	application/octet-stream	24.6 KB
v34-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	316.8 KB
v34-0002-Support-DDL-replication.patch	application/octet-stream	132.8 KB

From:	Ajin Cherian <itsajin(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-11-10 04:04:49
Message-ID:	CAFPTHDbhXwAfb8jJiMk1rDEz6zhCEXXxfJxau2S-fC_Bc12MJw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Tue, Nov 8, 2022 at 3:03 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
> I've addressed a few comments from Vignesh, there are quite a few more
> comments remaining. I will update them in my next patch.
>
Fixed a test case failing in my previous patch-set.

regards,
Ajin Cherian
Fujitsu Australia

Attachment	Content-Type	Size
v35-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	application/octet-stream	15.0 KB
v35-0004-Test-cases-for-DDL-replication.patch	application/octet-stream	24.6 KB
v35-0002-Support-DDL-replication.patch	application/octet-stream	133.2 KB
v35-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	316.8 KB

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-11-11 04:47:24
Message-ID:	CAHut+Pu6H6hY0JJNNRCRmFpM_3817z=0xjm-_ibP+cL85pBOpQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Here are more review comments for the v32-0001 file ddl_deparse.c

This completes my first review pass over this overly large file.

This review has taken a long time, so for any of my review comments
(in this and previous posts) that get rejected, please reply citing
the rejected reference numbers, because I hope to avoid spending
multiple days (in a future review) trying to reconcile what was
addressed vs what was not addressed. TIA.

*** NOTE - my review post became too big, so I split it into smaller parts.

THIS IS PART 1 OF 4.

======

src/backend/commands/ddl_deparse.c

G.1. GENERAL _VA args wrapping

+ tmp = new_objtree_VA("WITH GRANT OPTION",
+ 1, "present", ObjTypeBool,
+ stmt->action->grant_option);

In general, I think all these _VA() style function calls are easier to
read if you can arrange to put each of the argument names on a new
line instead of just wrapping them randomly.

So the above would be better as:

tmp = new_objtree_VA("WITH GRANT OPTION", 1,
"present", ObjTypeBool, stmt->action->grant_option);

Please search/modify all cases of new_objtree_VA like this.

~~~

G.2. GENERAL - "type" object

There are many functions that insert a "type" object for some purpose:

e.g.
+ tmpobj = new_objtree_VA("DETACH PARTITION %{partition_identity}D FINALIZE", 2,
+ "type", ObjTypeString, "detach partition finalize",
+ "partition_identity", ObjTypeObject,
+ new_objtree_for_qualname_id(RelationRelationId,
+ sub->address.objectId));

e.g.
+ tmpobj = new_objtree_VA(fmtstr, 2,
+ "type", ObjTypeString, "add column",
+ "definition", ObjTypeObject, tree);

I'm not sure yet what these "type" objects are used for, but I felt
that these unsubstituted values should look slightly more like enum
values, and slightly less like real SQL syntax.

For example - maybe do like this (for the above):

"detach partition finalize" -> "type_detach_partition_finalize"
"add column" -> "type_add_column"
etc.

~~~

G.3. GENERAL - JSON deparsed structures should be documented

AFAICT there are mixtures of different JSON structure styles at play
in this module. Sometimes there are trees with names and sometimes
not, sometimes there are "present" objects and sometimes not.
Sometimes entire trees seemed unnecessary to me. It feels quite
arbitrary in places but it's quite hard to compare them because
everything is spread across 9000+ lines.

IMO all these deparse structures ought to be documented. Then I think
it will become apparent that lots of them are inconsistent with the
others. Even if such documentation is ultimately not needed by
end-users, I think it would be a very valuable internal design
accompaniment to this module, and it would help a lot for
reviews/maintenance/bug prevention etc. Better late than never.

~~~

G.4 GENERAL - Underuse of _VA() function.

(Probably I've mentioned this before in previous review comments, but
I keep encountering this many times).

The json is sort of built up part by part and objects are appended ...
it was probably easier to think about each part during coding but OTOH
I think this style is often unnecessary. IMO most times the function
can be far simpler just by gathering together all the necessary values
and then using a single big new_objtree_VA() call to deparse the
complete format in one call. I think it could also shave 100s of lines
of code from the module.

~~~

G.5 GENERAL - Inconsistent function comment wording.

The function comments are worded in different ways...

"Given a XXX OID and the parse tree that created it, return an ObjTree
representing the creation command."

versus

"Given a XXX OID and the parse tree that created it, return the JSON
blob representing the creation command."

Please use consistent wording throughout.

~~~

G.6 GENERAL - present=false

There are many calls that do like:
append_bool_object(tmpobj, "present", false);

I was thinking the code would be cleaner if there was a wrapper function like:

static void
append_not_present(ObjTree objTree)
{
append_bool_object(objTree, "present", false);
}

~~~

G.7 GENERAL - psprintf format strings

There are quite a few places where the format string is
pre-constructed using psprintf.

e.g.
+ fmt = psprintf("ALTER %s %%{identity}s OWNER TO %%{newowner}I",
+ stringify_objtype(node->objectType));
+
+ ownerStmt = new_objtree_VA(fmt, 2,
+ "identity", ObjTypeString,
+ getObjectIdentity(&address, false),
+ "newowner", ObjTypeString,
+ get_rolespec_name(node->newowner));

It's not entirely clear to me why this kind of distinction is even
made, or even what are the rules governing the choice. AFAICT this
same result could be achieved by using another string substitution
marker. So why not do it that way instead of mixing different styles?

IMO many/most of the psprintf can be removed.

e.g. I mean something like this (for the above example):

fmt = "ALTER %{obj_type}s %{identity}s OWNER TO %{newowner}I";

ownerStmt = new_objtree_VA(fmt, 3,
"obj_type", ObjTypeString, stringify_objtype(node->objectType),
"identity", ObjTypeString, getObjectIdentity(&address, false),
"newowner", ObjTypeString, get_rolespec_name(node->newowner));

~~~

G.8 GENERAL - Inconsistent OID/oid in error messages.

errmsg("role with OID %u does not exist", roleoid)));
elog(ERROR, "cache lookup failed for collation with OID %u", objectId);
elog(ERROR, "cache lookup failure for function with OID %u",
elog(ERROR, "cache lookup failed for schema with OID %u",
errmsg("role with OID %u does not exist", istmt->grantor_uid)));
elog(ERROR, "cache lookup failed for operator with OID %u", objectId);
elog(ERROR, "cache lookup failed for type with OID %u", objectId);
elog(ERROR, "cache lookup failed for conversion with OID %u", objectId);
elog(ERROR, "cache lookup failed for extension with OID %u",
elog(ERROR, "cache lookup failed for extension with OID %u",
elog(ERROR, "cache lookup failed for cast with OID %u", objectId);
elog(ERROR, "cache lookup failed for domain with OID %u", objectId);
elog(ERROR, "cache lookup failure for function with OID %u",
elog(ERROR, "cache lookup failure for language with OID %u",
elog(ERROR, "null prosrc in function with OID %u", objectId);
elog(ERROR, "cache lookup failed for opclass with OID %u", opcoid);
elog(ERROR, "cache lookup failed for operator family with OID %u",
opcForm->opcfamily);
elog(ERROR, "cache lookup failed for operator family with OID %u", objectId);
elog(ERROR, "cache lookup failed for domain with OID %u",
elog(ERROR, "cache lookup failed for collation with OID %u", objectId);
elog(ERROR, "cache lookup failed for operator with OID %u", objectId);
elog(ERROR, "cache lookup failed for type with OID %u", objectId);
elog(ERROR, "cache lookup failed for text search parser with OID %u",
elog(ERROR, "cache lookup failed for text search dictionary " "with
OID %u", objectId);
elog(ERROR, "cache lookup failed for text search template with OID %u",
elog(ERROR, "cache lookup failed for text search dictionary " "with
OID %u", objectId);
elog(ERROR, "cache lookup failed for opclass with OID %u",
elog(ERROR, "cache lookup failed for operator family with OID %u",
elog(ERROR, "cache lookup failure for transform with OID %u",
elog(ERROR, "cache lookup failure for language with OID %u",
elog(ERROR, "cache lookup failure for function with OID %u",
elog(ERROR, "cache lookup failure for function with OID %u",
elog(ERROR, "cache lookup failed for rewrite rule for view with OID
%u", viewoid)

elog(ERROR, "cache lookup failed for range with type oid %u",
elog(ERROR, "cache lookup failed for rewrite rule with oid %u",

G.8a.
Most are uppercase 'OID'. A few are lowercase 'oid'

G.8b.
There is a mixture of "cache lookup failed" and "cache lookup failure"
-- should all be the same.

G.8c.
A few above (e.g. role) have a different message text. Shouldn't those
also be "cache lookup failed"?

~~~

G.9 GENERAL - ObjTree variables

Often the ObjTree variable (for the deparse_XXX function return) is
given the name of the statement it is creating.

Although it is good to be descriptive, often there is no need for long
variable names (e.g. 'createTransform' etc), because there is no
ambiguity anyway and it just makes for extra code characters and
unnecessary wrapping. IMO it would be better to just call everything
some short but *consistent* name across every function -- like 'stmt'
or 'json_ddl' or 'root' or 'ret' ... or whatever.

------
Kind Regards,
Peter Smith.
Fujitsu Australia.

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-11-11 05:09:04
Message-ID:	CAHut+PtHvH66uvih23Rm4Ajm2suHEVv1W5NJ1Y8tsQrn=u9qhQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, Nov 11, 2022 at 3:47 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Here are more review comments for the v32-0001 file ddl_deparse.c
>
> *** NOTE - my review post became too big, so I split it into smaller parts.

THIS IS PART 2 OF 4.

=======

src/backend/commands/ddl_deparse.c

1. deparse_AlterExtensionStmt

+/*
+ * Deparse an AlterExtensionStmt (ALTER EXTENSION .. UPDATE TO VERSION)
+ *
+ * Given an extension OID and a parse tree that modified it, return an ObjTree
+ * representing the alter type.
+ */
+static ObjTree *
+deparse_AlterExtensionStmt(Oid objectId, Node *parsetree)

Spurious blank space before "OID"

+ ObjTree *stmt;
+ ObjTree *tmp;
+ List *list = NIL;
+ ListCell *cell;

Variable 'tmp' can be declared only in the scope that it is used.

+ foreach(cell, node->options)
+ {
+ DefElem *opt = (DefElem *) lfirst(cell);
+
+ if (strcmp(opt->defname, "new_version") == 0)
+ {
+ tmp = new_objtree_VA("TO %{version}L", 2,
+ "type", ObjTypeString, "version",
+ "version", ObjTypeString, defGetString(opt));
+ list = lappend(list, new_object_object(tmp));
+ }
+ else
+ elog(ERROR, "unsupported option %s", opt->defname);
+ }

This code seems strange to be adding new versions to a list. How can
there be multiple new versions? It does not seem compatible with the
command syntax [1]

------

4. deparse_CreateCastStmt

+ initStringInfo(&func);
+ appendStringInfo(&func, "%s(",
+ quote_qualified_identifier(get_namespace_name(funcForm->pronamespace),
+ NameStr(funcForm->proname)));
+ for (i = 0; i < funcForm->pronargs; i++)
+ appendStringInfoString(&func,
+ format_type_be_qualified(funcForm->proargtypes.values[i]));
+ appendStringInfoChar(&func, ')');

Is this correct, or should there be some separators (e.g. commas)
between multiple arg-types?

------

5. deparse_AlterDefaultPrivilegesStmt

+
+static ObjTree *
+deparse_AlterDefaultPrivilegesStmt(CollectedCommand *cmd)

Missing function comment

+ schemas = lappend(schemas,
+ new_string_object(strVal(val)));

Unnecessary wrapping.

+ /* Add the IN SCHEMA clause, if any */
+ tmp = new_objtree("IN SCHEMA");
+ append_array_object(tmp, "%{schemas:, }I", schemas);
+ if (schemas == NIL)
+ append_bool_object(tmp, "present", false);
+ append_object_object(alterStmt, "%{in_schema}s", tmp);
+
+ /* Add the FOR ROLE clause, if any */
+ tmp = new_objtree("FOR ROLE");
+ append_array_object(tmp, "%{roles:, }R", roles);
+ if (roles == NIL)
+ append_bool_object(tmp, "present", false);
+ append_object_object(alterStmt, "%{for_roles}s", tmp);

I don't really understand why the logic prefers to add a whole new
empty tree with "present: false" versus just adding nothing at all
unless it is relevant.

+ if (stmt->action->is_grant)
+ grant = new_objtree("GRANT");
+ else
+ grant = new_objtree("REVOKE");
+
+ /* add the GRANT OPTION clause for REVOKE subcommand */
+ if (!stmt->action->is_grant)
+ {
+ tmp = new_objtree_VA("GRANT OPTION FOR",
+ 1, "present", ObjTypeBool,
+ stmt->action->grant_option);
+ append_object_object(grant, "%{grant_option}s", tmp);
+ }

That 2nd 'if' can just be combined with the 'else' logic of the prior if.

+ Assert(priv->cols == NIL);
+ privs = lappend(privs,
+ new_string_object(priv->priv_name));

Unnecessary wrapping.

------

10. deparse_AlterTableStmt

Maybe this function name should be different because it is not only
for TABLEs but also serves for INDEX, VIEW, TYPE, etc

11.

AFAICT every case in the switch (subcmd->subtype) is doing subcmds =
lappend(subcmds, new_object_object(tmpobj));

Just doing this in common code at the end might be an easy way to
remove ~50 lines of duplicate code.

------

12. deparse_ColumnDef

+ * NOT NULL constraints in the column definition are emitted directly in the
+ * column definition by this routine; other constraints must be emitted
+ * elsewhere (the info in the parse node is incomplete anyway.).
+ */
+static ObjTree *
+deparse_ColumnDef(Relation relation, List *dpcontext, bool composite,
+ ColumnDef *coldef, bool is_alter, List **exprs)

"anyway.)." -> "anyway)."

13.

+ /* USING clause */
+ tmpobj = new_objtree("COMPRESSION");
+ if (coldef->compression)
+ append_string_object(tmpobj, "%{compression_method}I", coldef->compression);
+ else
+ {
+ append_null_object(tmpobj, "%{compression_method}I");
+ append_bool_object(tmpobj, "present", false);
+ }

Why is it necessary to specify a NULL compression method if the entire
"COMPRESSION" is anyway flagged as present=false?

14.

+ foreach(cell, coldef->constraints)
+ {
+ Constraint *constr = (Constraint *) lfirst(cell);
+
+ if (constr->contype == CONSTR_NOTNULL)
+ saw_notnull = true;
+ }

Why not break immediately from this loop the first time you find
'saw_notnull' true?

~~~

15.

+ tmpobj = new_objtree("DEFAULT");
+ if (attrForm->atthasdef)
+ {
+ char *defstr;
+
+ defstr = RelationGetColumnDefault(relation, attrForm->attnum,
+ dpcontext, exprs);
+
+ append_string_object(tmpobj, "%{default}s", defstr);
+ }
+ else
+ append_bool_object(tmpobj, "present", false);
+ append_object_object(column, "%{default}s", tmpobj);

Something seems a bit strange here. It looks like there are formats
called "%{default}s" at 2 levels in this tree, so will it cause a
hierarchy of objects with the same name?

------

16. deparse_ColumnIdentity

+ column = new_objtree("");
+
+ if (!OidIsValid(seqrelid))
+ {
+ append_bool_object(column, "present", false);
+ return column;
+ }

I don't really understand the point of making empty tree structures
for not "present" elements. IIUC this is just going to make the tree
bigger for no reason and all these not "present" branches will be
ultimately thrown away, right? I guess the justification is that it
might be for debugging/documentation but that does not really stand up
in this case because it seems like just a nameless tree here.

------

17. deparse_CreateDomain

+ createDomain = new_objtree("CREATE");
+
+ append_object_object(createDomain,
+ "DOMAIN %{identity}D AS",
+ new_objtree_for_qualname_id(TypeRelationId,
+ objectId));
+ append_object_object(createDomain,
+ "%{type}T",
+ new_objtree_for_type(typForm->typbasetype, typForm->typtypmod));
+
+ if (typForm->typnotnull)
+ append_string_object(createDomain, "%{not_null}s", "NOT NULL");
+ else
+ append_string_object(createDomain, "%{not_null}s", "");

17a.
I don't understand why this is not just a single _VA() call instead of
spread over multiple append_objects like this.

17b.
In other places, something like the "%{not_null}s" is done with a
ternary operator instead of the excessive if/else.

------

18. deparse_CreateFunction

+ if (isnull)
+ probin = NULL;
+ else
+ {
+ probin = TextDatumGetCString(tmpdatum);
+ if (probin[0] == '\0' || strcmp(probin, "-") == 0)
+ probin = NULL;
+ }

Maybe it is simpler to assign prbin = NULL where it is declared, then
here you only need to test the !isnull case.

19.

+ append_string_object(createFunc, "%{or_replace}s",
+ node->replace ? "OR REPLACE" : "");

It is not clear to me what is the point of such code - I mean if
node->replace is false why do append at all? ... Why not use
appen_format_string() instead()?
My guess is that this way is preferred to simplify the calling code,
but knowing that a "" value will just do nothing anyway - seems an
overcomplicated way to do it though.

20.

+ typarray = palloc(list_length(node->parameters) * sizeof(Oid));
+ if (list_length(node->parameters) > procForm->pronargs)
+ {
+ Datum alltypes;
+ Datum *values;
+ bool *nulls;
+ int nelems;
+
+ alltypes = SysCacheGetAttr(PROCOID, procTup,
+ Anum_pg_proc_proallargtypes, &isnull);
+ if (isnull)
+ elog(ERROR, "NULL proallargtypes, but more parameters than args");
+ deconstruct_array(DatumGetArrayTypeP(alltypes),
+ OIDOID, 4, 't', 'i',
+ &values, &nulls, &nelems);
+ if (nelems != list_length(node->parameters))
+ elog(ERROR, "mismatched proallargatypes");
+ for (i = 0; i < list_length(node->parameters); i++)
+ typarray[i] = values[i];
+ }
+ else
+ {
+ for (i = 0; i < list_length(node->parameters); i++)
+ typarray[i] = procForm->proargtypes.values[i];
+ }

The list_length(node->parameters) is used multiple times here; it
might have been cleaner code to assign that to some local variable.

21.

+ * Note that %{name}s is a string here, not an identifier; the reason
+ * for this is that an absent parameter name must produce an empty
+ * string, not "", which is what would happen if we were to use
+ * %{name}I here. So we add another level of indirection to allow us
+ * to inject a "present" parameter.
+ */

The above comment says:
must produce an empty string, not ""

I didn't get the point - what is the difference between an empty string and ""?

22.

+ append_string_object(paramobj, "%{mode}s",
+ param->mode == FUNC_PARAM_IN ? "IN" :
+ param->mode == FUNC_PARAM_OUT ? "OUT" :
+ param->mode == FUNC_PARAM_INOUT ? "INOUT" :
+ param->mode == FUNC_PARAM_VARIADIC ? "VARIADIC" :
+ "IN");

There doesn't seem to be much point to test for param->mode ==
FUNC_PARAM_IN here since "IN" is the default mode anyway.

23.

+ name = new_objtree("");
+ append_string_object(name, "%{name}I",
+ param->name ? param->name : "NULL");
+
+ append_bool_object(name, "present",
+ param->name ? true : false);

IIUC it is uncommon to inject a "present" object if it was "true", so
why do it like that here?

24.

+ append_format_string(tmpobj, "(");
+ append_array_object(tmpobj, "%{arguments:, }s", params);
+ append_format_string(tmpobj, ")");

Is it necessary to do that in 3 lines? IIUC it would be the same if
the parens were just included in the append_array_object format,
right?

25.

+ if (procForm->prosupport)
+ {
+ Oid argtypes[1];
+
+ /*
+ * We should qualify the support function's name if it wouldn't be
+ * resolved by lookup in the current search path.
+ */
+ argtypes[0] = INTERNALOID;

Might as well just declare this as:

Oid argtypes[] = { INTERNALOID };

------

26. deparse_CreateOpClassStmt

+
+ stmt = new_objtree_VA("CREATE OPERATOR CLASS %{identity}D", 1,
+ "identity", ObjTypeObject,
+ new_objtree_for_qualname(opcForm->opcnamespace,
+ NameStr(opcForm->opcname)));
+
+ /* Add the DEFAULT clause */
+ append_string_object(stmt, "%{default}s",
+ opcForm->opcdefault ? "DEFAULT" : "");
+
+ /* Add the FOR TYPE clause */
+ append_object_object(stmt, "FOR TYPE %{type}T",
+ new_objtree_for_type(opcForm->opcintype, -1));
+
+ /* Add the USING clause */
+ append_string_object(stmt, "USING %{amname}I",
get_am_name(opcForm->opcmethod));

This can all be done just as a single VA call I think.

27.

+ append_format_string(tmpobj, "(");
+ append_array_object(tmpobj, "%{argtypes:, }T", arglist);
+ append_format_string(tmpobj, ")");

AFAIK this can just be done by a single call including the parens in
the format string of appen_array_object.

------

28. deparse_CreatePolicyStmt

+
+static ObjTree *
+deparse_CreatePolicyStmt(Oid objectId, Node *parsetree)

Missing function comment.

29.

+ /* Add the rest of the stuff */
+ add_policy_clauses(policy, objectId, node->roles, !!node->qual,
+ !!node->with_check);

The !! to cast the pointer parameter to boolean is cute, but IIUC that
is not commonly used in the PG source. Maybe it is more conventional
to just pass node->qual != NULL etc?

------

30. deparse_AlterPolicyStmt

+
+static ObjTree *
+deparse_AlterPolicyStmt(Oid objectId, Node *parsetree)

Missing function comment.

31.

+ /* Add the rest of the stuff */
+ add_policy_clauses(policy, objectId, node->roles, !!node->qual,
+ !!node->with_check);

The !! to cast the pointer parameter to boolean is cute, but IIUC that
technique is not commonly used in the PG source. Maybe it is more
conventional to just pass node->qual != NULL etc?

------

32. deparse_CreateSchemaStmt

+ else
+ {
+ append_null_object(auth, "%{authorization_role}I ");
+ append_bool_object(auth, "present", false);
+ }

32a.
Why append a NULL object if the "present" says it is false anyway?

32b.
"%{authorization_role}I " -- why do they have extra space on the end?
Just let the append_XXX functions can take care of the space
separators automagically instead.

------

33. deparse_AlterDomainStmt

+ {
+ fmt = "ALTER DOMAIN";
+ type = "drop default";
+ alterDom = new_objtree_VA(fmt, 1, "type", ObjTypeString, type);

This code style of assigning the 'fmt' and 'type' like this is not
typical of all the other deparse_XXX functions which just pass
parameter literals. Also, I see no good reason that the 'fmt' is
unconditionally assigned to "ALTER DOMAIN" in 6 different places.

34.

AFAICT all these cases can be simplified to use single VA() calls and
remove all the append_XXX.

35.

+
+ break;
+ case 'N':

Spurious or misplaced blank line.

36.

+ case 'C':
+
+ /*
+ * ADD CONSTRAINT. Only CHECK constraints are supported by
+ * domains
+ */

A spurious blank line is inconsistent with the other cases.

36.

+
+ break;
+ default:

Spurious or misplaced blank line.

------

37. deparse_CreateStatisticsStmt

+ append_format_string(createStat, "FROM");
+
+ append_object_object(createStat, "%{stat_table_identity}D",
+ new_objtree_for_qualname(get_rel_namespace(statform->stxrelid),
+ get_rel_name(statform->stxrelid)));

It would be easier to do things like this using a single call using a
format of "FROM %{stat_table_identity}D", rather than have the extra
append_format_string call.

------

38. deparse_CreateForeignServerStmt

+ /* Add a TYPE clause, if any */
+ tmp = new_objtree_VA("TYPE", 0);
+ if (node->servertype)
+ append_string_object(tmp, "%{type}L", node->servertype);
+ else
+ append_bool_object(tmp, "present", false);
+ append_object_object(createServer, "%{type}s", tmp);
+
+ /* Add a VERSION clause, if any */
+ tmp = new_objtree_VA("VERSION", 0);

Why use the VA() function if passing 0 args?

39.

+ append_string_object(createServer, "FOREIGN DATA WRAPPER %{fdw}I",
node->fdwname);
+ /* add an OPTIONS clause, if any */
+ append_object_object(createServer, "%{generic_options}s",
+ deparse_FdwOptions(node->options, NULL));

39a.
Use uppercase comment.

39b.
Missing blank line above comment?

------

40. deparse_AlterForeignServerStmt

+ /* Add a VERSION clause, if any */
+ tmp = new_objtree_VA("VERSION", 0);

Why use the VA() function if passing 0 args?

41.

+ /* Add a VERSION clause, if any */
+ tmp = new_objtree_VA("VERSION", 0);
+ if (node->has_version && node->version)
+ append_string_object(tmp, "%{version}L", node->version);
+ else if (node->has_version)
+ append_string_object(tmp, "version", "NULL");
+ else
+ append_bool_object(tmp, "present", false);
+ append_object_object(alterServer, "%{version}s", tmp);
+
+ /* Add a VERSION clause, if any */
+ tmp = new_objtree_VA("VERSION", 0);
+ if (node->has_version && node->version)
+ append_string_object(tmp, "%{version}L", node->version);
+ else if (node->has_version)
+ append_string_object(tmp, "version", "NULL");
+ else
+ append_bool_object(tmp, "present", false);
+ append_object_object(alterServer, "%{version}s", tmp);

Huh? Looks like a cut/paste error of duplicate VERSION clauses. Is this correct?

------

42. deparse_CreateStmt

+ if (tableelts == NIL)
+ {
+ tmpobj = new_objtree("");
+ append_bool_object(tmpobj, "present", false);
+ }
+ else
+ tmpobj = new_objtree_VA("(%{elements:, }s)", 1,
+ "elements", ObjTypeArray, tableelts);

This fragment seemed a bit complicated. IIUC this is the same as just:

tmpobj = new_objtree("");
if (tableelts)
append_array_object(tmpobj, "(%{elements:, }s)", tableelts);
else
append_bool_object(tmpobj, "present", false);

43.

+ tmpobj = new_objtree("INHERITS");
+ if (list_length(node->inhRelations) > 0)
+ append_array_object(tmpobj, "(%{parents:, }D)",
deparse_InhRelations(objectId));
+ else
+ {
+ append_null_object(tmpobj, "(%{parents:, }D)");
+ append_bool_object(tmpobj, "present", false);
+ }
+ append_object_object(createStmt, "%{inherits}s", tmpobj);

43a.
AFAIK convention for checking non-empty List is just "if
(node->inhRelations != NIL)" or simply "if (node->inhRelations)

43b.
Maybe I misunderstand something but I don't see why append_null_object
is needed for tree marked as "present"=false anyhow. This similar
pattern happens multiple times in this function.

------

44. deparse_DefineStmt

+ switch (define->kind)
+ {

IMO better to put all these OBJECT_XXX cases in alphabetical order
instead of just random.

45.

+ default:
+ elog(ERROR, "unsupported object kind");
+ }

Should this also log what the define->kind was attempted?

------

46. deparse_DefineStmt_Collation

+ stmt = new_objtree_VA("CREATE COLLATION", 0);
+
+ append_object_object(stmt, "%{identity}D",
+ new_objtree_for_qualname(colForm->collnamespace,
+ NameStr(colForm->collname)));

Why not combine there to avoid VA args with 0 and use VA args with 1 instead?

47.

+ if (fromCollid.objectId != InvalidOid)

Use OisIsValid macro.

48.

+ append_object_object(stmt, "FROM %{from_identity}D",
+ new_objtree_for_qualname(fromColForm->collnamespace,
+ NameStr(fromColForm->collname)));
+
+
+ ReleaseSysCache(tp);
+ ReleaseSysCache(colTup);
+ return stmt;

Extra blank line.

49.

+ if (!isnull)
+ {
+ tmp = new_objtree_VA("LOCALE=", 1,
+ "clause", ObjTypeString, "locale");
+ append_string_object(tmp, "%{locale}L",
+ psprintf("%s", TextDatumGetCString(datum)));

IMO it should be easy enough to express this using a single VA(2 args)
function, so avoiding the extra append_string. e.g. other functions
like deparse_DefineStmt_Operator do this.

And this same comment also applies to the rest of this function:
- tmp = new_objtree_VA("LC_COLLATE=", 1,
- tmp = new_objtree_VA("LC_CTYPE=", 1,
- tmp = new_objtree_VA("PROVIDER=", 1,
- tmp = new_objtree_VA("PROVIDER=", 1,
- tmp = new_objtree_VA("DETERMINISTIC=", 1,
- tmp = new_objtree_VA("VERSION=", 1,

------
[1] ALTER EXTENSION - https://www.postgresql.org/docs/15/sql-alterextension.html

Kind Regards.
Peter Smith
Fujitsu Australia

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-11-11 05:17:58
Message-ID:	CAHut+PsFwO7xOpng59CXVk-6_1vSj-TTYuy0Lwunj-sdxMBvbg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, Nov 11, 2022 at 4:09 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> On Fri, Nov 11, 2022 at 3:47 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
> > Here are more review comments for the v32-0001 file ddl_deparse.c
> >
> > *** NOTE - my review post became too big, so I split it into smaller parts.
>

THIS IS PART 3 OF 4.

=======

src/backend/commands/ddl_deparse.c

50. deparse_DefineStmt_Operator

+/*
+ * Deparse a DefineStmt (CREATE OPERATOR)
+ *
+ * Given a trigger OID and the parse tree that created it, return an ObjTree
+ * representing the creation command.
+ */
+static ObjTree *
+deparse_DefineStmt_Operator(Oid objectId, DefineStmt *define)

"trigger OID" ?? Is that right?

51.

/* MERGES */
tmpobj = new_objtree_VA("MERGES", 1,
"clause", ObjTypeString, "merges");
if (!oprForm->oprcanmerge)
append_bool_object(tmpobj, "present", false);
list = lappend(list, new_object_object(tmpobj));

/* HASHES */
tmpobj = new_objtree_VA("HASHES", 1,
"clause", ObjTypeString, "hashes");
if (!oprForm->oprcanhash)
append_bool_object(tmpobj, "present", false);
list = lappend(list, new_object_object(tmpobj));

Maybe HASHES and MERGES should be done in a different order, just to
be consistent with the PG documentation [2].

------

52. deparse_DefineStmt_Type

+ /* Shortcut processing for shell types. */
+ if (!typForm->typisdefined)
+ {
+ stmt = new_objtree_VA("CREATE TYPE", 0);
+ append_object_object(stmt, "%{identity}D",
+ new_objtree_for_qualname(typForm->typnamespace,
+ NameStr(typForm->typname)));
+ append_bool_object(stmt, "present", true);
+ ReleaseSysCache(typTup);
+ return stmt;
+ }
+
+ stmt = new_objtree_VA("CREATE TYPE", 0);
+ append_object_object(stmt, "%{identity}D",
+ new_objtree_for_qualname(typForm->typnamespace,
+ NameStr(typForm->typname)));
+ append_bool_object(stmt, "present", true);

52a.
This code looked strange because everything is the same except the
Release/return, so IMO it should be refactored to use the common code.

52b.
The VA(0 args) should be combined with the subsequent appends to use
fewer append_XXX calls.

53.
Is it necessary to say append_bool_object(stmt, "present", true); ? --
I'd assumed that is the default unless it explicitly says false.

54.

/* INPUT */
tmp = new_objtree_VA("(INPUT=", 1,
"clause", ObjTypeString, "input");
append_object_object(tmp, "%{procedure}D",
new_objtree_for_qualname_id(ProcedureRelationId,
typForm->typinput));
list = lappend(list, new_object_object(tmp));

/* OUTPUT */
tmp = new_objtree_VA("OUTPUT=", 1,
"clause", ObjTypeString, "output");
append_object_object(tmp, "%{procedure}D",
new_objtree_for_qualname_id(ProcedureRelationId,
typForm->typoutput));
list = lappend(list, new_object_object(tmp));

These could each be simplified into single VA() function calls, the
same as was done in deparse_DefineStmt_Operator PROCEDURE.

And the same comment applies to other parts. e.g.:
- /* CATEGORY */
- /* ALIGNMENT */
- STORAGE

55.

+ tmp = new_objtree_VA("STORAGE=", 1,
+ "clause", ObjTypeString, "storage");

Missing comment above this to say /* STORAGE */

56.

+ /* INTERNALLENGTH */
+ if (typForm->typlen == -1)
+ {
+ tmp = new_objtree_VA("INTERNALLENGTH=VARIABLE", 0);
+ }
+ else
+ {
+ tmp = new_objtree_VA("INTERNALLENGTH=%{typlen}n", 1,
+ "typlen", ObjTypeInteger, typForm->typlen);
+ }

56a.
The VA(args = 0) does not need to be a VA function.

56b.
The { } blocks are unnecessary

------

57. deparse_DefineStmt_TSConfig

+
+static ObjTree *
+deparse_DefineStmt_TSConfig(Oid objectId, DefineStmt *define,
+ ObjectAddress copied)

Missing function comment.

58.

+ stmt = new_objtree("CREATE");
+
+ append_object_object(stmt, "TEXT SEARCH CONFIGURATION %{identity}D",
+ new_objtree_for_qualname(tscForm->cfgnamespace,
+ NameStr(tscForm->cfgname)));

Why not combine these using VA() function?

59.

+ list = NIL;
+ /* COPY */

Just assign NIL when declared.

60.

+ if (copied.objectId != InvalidOid)

Use OidIsValid macro.

------

61. deparse_DefineStmt_TSParser

+
+static ObjTree *
+deparse_DefineStmt_TSParser(Oid objectId, DefineStmt *define)

Missing function comment.

62.

+ stmt = new_objtree("CREATE");
+
+ append_object_object(stmt, "TEXT SEARCH PARSER %{identity}D",
+ new_objtree_for_qualname(tspForm->prsnamespace,
+ NameStr(tspForm->prsname)));

Why not combine as a single VA() function call?

63.

+ list = NIL;

Just assign NIL when declared

64.

tmp = new_objtree_VA("START=", 1,
"clause", ObjTypeString, "start");
append_object_object(tmp, "%{procedure}D",
new_objtree_for_qualname_id(ProcedureRelationId,
tspForm->prsstart));

Easily combined to be a single VA() function call.

The same comment applies for
- /* GETTOKEN */
- /* END */
- /* LEXTYPES */

------

65. deparse_DefineStmt_TSDictionary

+static ObjTree *
+deparse_DefineStmt_TSDictionary(Oid objectId, DefineStmt *define)

Missing function comment.

66.

+ stmt = new_objtree("CREATE");
+
+ append_object_object(stmt, "TEXT SEARCH DICTIONARY %{identity}D",
+ new_objtree_for_qualname(tsdForm->dictnamespace,
+ NameStr(tsdForm->dictname)));

Why not combine this as a single VA() function call?

67.

+ list = NIL;

Just assign NIL when declared

68.

+ tmp = new_objtree_VA("", 0);

Don't need VA() function for 0 args.

------

69. deparse_DefineStmt_TSTemplate

+static ObjTree *
+deparse_DefineStmt_TSTemplate(Oid objectId, DefineStmt *define)

Missing function comment.

70.

+ stmt = new_objtree("CREATE");
+
+ append_object_object(stmt, "TEXT SEARCH TEMPLATE %{identity}D",
+ new_objtree_for_qualname(tstForm->tmplnamespace,
+ NameStr(tstForm->tmplname)));

Combine this to be a single VA() function call.

71.

+ list = NIL;

Just assign NIL when declared

72.

+ tmp = new_objtree_VA("LEXIZE=", 1,
+ "clause", ObjTypeString, "lexize");
+ append_object_object(tmp, "%{procedure}D",
+ new_objtree_for_qualname_id(ProcedureRelationId,
+ tstForm->tmpllexize));

Combine this to be a single VA() function call.

------

73. deparse_AlterTSConfigurationStmt

+static ObjTree *
+deparse_AlterTSConfigurationStmt(CollectedCommand *cmd)

Missing function comment.

74.

+ /* determine the format string appropriate to each subcommand */
+ switch (node->kind)

Uppercase comment

75.

+ tmp = new_objtree_VA("IF EXISTS", 0);

Should not use a VA() function with 0 args.

76.

+ case ALTER_TSCONFIG_ALTER_MAPPING_FOR_TOKEN:
+ append_object_object(config, "%{identity}D ALTER MAPPING",
+ new_objtree_for_qualname_id(cmd->d.atscfg.address.classId,
+ cmd->d.atscfg.address.objectId));
+ break;
+
+ case ALTER_TSCONFIG_REPLACE_DICT:
+ append_object_object(config, "%{identity}D ALTER MAPPING",
+ new_objtree_for_qualname_id(cmd->d.atscfg.address.classId,
+ cmd->d.atscfg.address.objectId));
+ break;
+
+ case ALTER_TSCONFIG_REPLACE_DICT_FOR_TOKEN:
+ append_object_object(config, "%{identity}D ALTER MAPPING",
+ new_objtree_for_qualname_id(cmd->d.atscfg.address.classId,
+ cmd->d.atscfg.address.objectId));
+ break;

If all these 3 cases have identical code then why repeat it three times?

77.

+ /* add further subcommand-specific elements */

Uppercase comment

78.

+ /* the REPLACE forms want old and new dictionaries */
+ Assert(cmd->d.atscfg.ndicts == 2);

Uppercase comment.

------

79. deparse_AlterTSDictionaryStmt

+
+static ObjTree *
+deparse_AlterTSDictionaryStmt(Oid objectId, Node *parsetree)

Missing function comment

80.

+ alterTSD = new_objtree("ALTER TEXT SEARCH DICTIONARY");
+
+ append_object_object(alterTSD, "%{identity}D",
+ new_objtree_for_qualname(tsdForm->dictnamespace,
+ NameStr(tsdForm->dictname)));

Combine this as a sing VA() function call

81.

+ tmp = new_objtree_VA("", 0);

Don't use the VA() function for 0 args.

------

82. deparse_RelSetOptions

+ if (is_reset)
+ fmt = "RESET ";
+ else
+ fmt = "SET ";
+
+ relset = new_objtree(fmt);

82a.
Those format trailing spaces are a bit unusual. The append_XXX will
take care of space separators anyhow so it is not needed like this.

82b.
This can all be simplified to one line:

relset = new_objtree(is_reset ? "RESET" : "SET");

------

83. deparse_ViewStmt

+ * Given a view OID and the parsetree that created it, return an ObjTree
+ * representing the creation command.
+ */

Be consistent with other function headers:

"parsetree" -> "parse tree".

84.

+ viewStmt = new_objtree("CREATE");
+
+ append_string_object(viewStmt, "%{or_replace}s",
+ node->replace ? "OR REPLACE" : "");
+
+ append_string_object(viewStmt, "%{persistence}s",
+ get_persistence_str(relation->rd_rel->relpersistence));
+
+ tmp = new_objtree_for_qualname(relation->rd_rel->relnamespace,
+ RelationGetRelationName(relation));
+
+ append_object_object(viewStmt, "VIEW %{identity}D", tmp);
+
+ append_string_object(viewStmt, "AS %{query}s",
+ pg_get_viewdef_internal(objectId));

IMO all of this can be combined in a single VA() function call.

------

85. deparse_CreateTableAsStmt_vanilla

+/*
+ * Deparse CREATE Materialized View statement, it is a variant of
CreateTableAsStmt
+ *
+ * Note that CREATE TABLE AS SELECT INTO can also be deparsed by
+ * deparse_CreateTableAsStmt to remove the SELECT INTO clause.
+ */
+static ObjTree *
+deparse_CreateTableAsStmt_vanilla(Oid objectId, Node *parsetree)

The function comment refers to 'deparse_CreateTableAsStmt' but I don't
see any such function. Maybe this was renamed causing the comment
became stale?

86.

+ /* Add identity */
+ append_object_object(createStmt, "%{identity}D",
+ new_objtree_for_qualname_id(RelationRelationId,
+ objectId));

This could be included as another arg of the preceding VA() call/

87.

+ /* COLLUMNS clause */
+ if (node->into->colNames == NIL)
+ tmp = new_objtree_VA("", 1,
+ "present", ObjTypeBool, false);
+ else

87a.
Typo "COLLUMNS"

87b.
It might be more usual/natural to reverse this if/else to check the
list is NOT empty. e.g.

if (node->into->colNames)
...
else
tmp = new_objtree_VA("", 1,
"present", ObjTypeBool, false);

88.

+ tmp = new_objtree("USING");
+ if (node->into->accessMethod)
+ append_string_object(tmp, "%{access_method}I", node->into->accessMethod);
+ else
+ {
+ append_null_object(tmp, "%{access_method}I");
+ append_bool_object(tmp, "present", false);
+ }

I'm not sure why a null object is necessary when present = false.

89.

+ /* WITH clause */
+ tmp = new_objtree_VA("WITH", 0);

VA() function call is not needed when there are 0 args.

90.

+ /* TABLESPACE clause */
+ tmp = new_objtree_VA("TABLESPACE %{tablespace}I", 0);

VA() function call nor needed when there are 0 args.

91.

+ else
+ {
+ append_null_object(tmp, "%{tablespace}I");
+ append_bool_object(tmp, "present", false);
+ }

I'm not sure why a null object is necessary when present = false.

92.

+ /* add a WITH NO DATA clause */
+ tmp = new_objtree_VA("WITH NO DATA", 1,
+ "present", ObjTypeBool,
+ node->into->skipData ? true : false);
+ append_object_object(createStmt, "%{with_no_data}s", tmp);

92a.
Uppercase comment.

92b.
It is a bit confusing that this style of specifying empty tree (just
saying present/not present) is used here. But elsewhere in this patch
for similar syntax it just adds text or an empty string.
e.g.
+ append_string_object(renameStmt, "%{if_exists}s",
+ node->missing_ok ? "IF EXISTS" : "");

IMO it's better to apply a consistent deparse approach for everything.
But without documentation of the deparse structure, it is kind of
impossible to know even what *are* the rules?

------

93. deparse_CreateTrigStmt

+ trigger = new_objtree("CREATE");
+
+ append_string_object(trigger, "%{constraint}s",
+ node->isconstraint ? "CONSTRAINT" : "");
+
+ append_string_object(trigger, "TRIGGER %{name}I", node->trigname);

All this can be combined into a single VA() call.

94.

+ if (node->timing == TRIGGER_TYPE_BEFORE)
+ append_string_object(trigger, "%{time}s", "BEFORE");
+ else if (node->timing == TRIGGER_TYPE_AFTER)
+ append_string_object(trigger, "%{time}s", "AFTER");
+ else if (node->timing == TRIGGER_TYPE_INSTEAD)
+ append_string_object(trigger, "%{time}s", "INSTEAD OF");
+ else
+ elog(ERROR, "unrecognized trigger timing type %d", node->timing);

It might be better to assign the value to a char* and then just have
only a single append_string_object() call.

char *tval =
node->timing == TRIGGER_TYPE_BEFORE ? "BEFORE" :
node->timing == TRIGGER_TYPE_AFTER ? "AFTER" :
node->timing == TRIGGER_TYPE_INSTEAD ? "INSTEAD OF" :
NULL;
if (tval == NULL)
elog(ERROR, "unrecognized trigger timing type %d", node->timing);
append_string_object(trigger, "%{time}s", tval);

95.

+ tmpobj = new_objtree_VA("FROM", 0);

VA() function call is not needed for 0 args.

96.

+ tmpobj = new_objtree_VA("WHEN", 0);

VA() function call is not needed for 0 args.

97.

Should use consistent wording for unexpected nulls.

e.g.1
+ if (isnull)
+ elog(ERROR, "bogus NULL tgqual");

e.g.2
+ if (isnull)
+ elog(ERROR, "invalid NULL tgargs");

98.

+ append_format_string(tmpobj, "(");
+ append_array_object(tmpobj, "%{args:, }L", list); /* might be NIL */
+ append_format_string(tmpobj, ")");

IMO probably that can be a single call to append_array_object which
includes the enclosing parens.

------
[2] CREATE OPERATOR -
https://www.postgresql.org/docs/current/sql-createoperator.html

Kind Regards,
Peter Smith.
Fujitsu Australia

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-11-11 05:33:04
Message-ID:	CAHut+PvBzyJuKqMDUOs4AZPSKqNaNfQBLXsFn-O_OZqyXeLv+A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, Nov 11, 2022 at 4:17 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> On Fri, Nov 11, 2022 at 4:09 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
> > On Fri, Nov 11, 2022 at 3:47 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > >
> > > Here are more review comments for the v32-0001 file ddl_deparse.c
> > >
> > > *** NOTE - my review post became too big, so I split it into smaller parts.
> >
>

THIS IS PART 4 OF 4.

=======

src/backend/commands/ddl_deparse.c

99. deparse_CreateUserMappingStmt

+ /*
+ * Lookup up object in the catalog, so we don't have to deal with
+ * current_user and such.
+ */

Typo "Lookup up"

100.

+ createStmt = new_objtree("CREATE USER MAPPING ");
+
+ append_object_object(createStmt, "FOR %{role}R",
new_objtree_for_role_id(form->umuser));
+
+ append_string_object(createStmt, "SERVER %{server}I", server->servername);

All this can be combined into a single VA() function call.

101.

+ /* add an OPTIONS clause, if any */

Uppercase comment.

------

102. deparse_AlterUserMappingStmt

+ /*
+ * Lookup up object in the catalog, so we don't have to deal with
+ * current_user and such.
+ */
+
+ tp = SearchSysCache1(USERMAPPINGOID, ObjectIdGetDatum(objectId));

102a.
Typo "Lookup up"

102b.
Unnecessary blank line.

103.

+ alterStmt = new_objtree("ALTER USER MAPPING");
+
+ append_object_object(alterStmt, "FOR %{role}R",
new_objtree_for_role_id(form->umuser));
+
+ append_string_object(alterStmt, "SERVER %{server}I", server->servername);

Can be combined into a single VA() function call.

104.
+ /* add an OPTIONS clause, if any */

Uppercase comment

------

105. deparse_AlterStatsStmt

+ alterStat = new_objtree("ALTER STATISTICS");
+
+ /* Lookup up object in the catalog */
+ tp = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(objectId));
+ if (!HeapTupleIsValid(tp))
+ elog(ERROR, "cache lookup failed for statistic %u", objectId);
+
+ statform = (Form_pg_statistic_ext) GETSTRUCT(tp);
+
+ append_object_object(alterStat, "%{identity}D",
+ new_objtree_for_qualname(statform->stxnamespace,
+ NameStr(statform->stxname)));
+
+ append_float_object(alterStat, "SET STATISTICS %{target}n",
node->stxstattarget);

105a.
This was a biff unusual to put the new_objtree even before the catalog lookup.

105b.
All new_objtreee and append_XXX can be combined as a single VA()
function call here.

------

106. deparse_RefreshMatViewStmt

+ refreshStmt = new_objtree_VA("REFRESH MATERIALIZED VIEW", 0);
+
+ /* Add a CONCURRENTLY clause */
+ append_string_object(refreshStmt, "%{concurrently}s",
+ node->concurrent ? "CONCURRENTLY" : "");
+ /* Add the matview name */
+ append_object_object(refreshStmt, "%{identity}D",
+ new_objtree_for_qualname_id(RelationRelationId,
+ objectId));
+ /* Add a WITH NO DATA clause */
+ tmp = new_objtree_VA("WITH NO DATA", 1,
+ "present", ObjTypeBool,
+ node->skipData ? true : false);
+ append_object_object(refreshStmt, "%{with_no_data}s", tmp);

106a.
Don't use VA() function for 0 args.

106b.
Huh? There are 2 different implementation styles here for the optional clauses
- CONCURRENTLY just replaces with an empty string
- WITH NOT DATA - has a new ObjTree either present/not present

106c.
Most/all of this can be combined into a single VA call.

------

107. deparse_DefElem

+ set = new_objtree("");
+ optname = new_objtree("");
+
+ if (elem->defnamespace != NULL)
+ append_string_object(optname, "%{schema}I.", elem->defnamespace);
+
+ append_string_object(optname, "%{label}I", elem->defname);
+
+ append_object_object(set, "%{label}s", optname);

The set should be created *after* the optname, then it can be done
something like:

set = new_objtree_VA("%{label}s", 1, "label", OptTyeString, optname);

108.

+ append_string_object(set, " = %{value}L",
+ elem->arg ? defGetString(elem) :
+ defGetBoolean(elem) ? "TRUE" : "FALSE");

The calling code does not need to prefix the format with spaces like
this. The append_XXX will handle space separators automatically.

------

109. deparse_drop_command

+ stmt = new_objtree_VA(fmt, 1, "objidentity", ObjTypeString, identity);
+ stmt2 = new_objtree_VA("CASCADE", 1,
+ "present", ObjTypeBool, behavior == DROP_CASCADE);
+
+ append_object_object(stmt, "%{cascade}s", stmt2);

109a.
'stmt2' is a poor name. "CASCADE" is not a statement. Even 'tmpobj'
would be better here.

109b.
The 2 lines of cascade should be grouped together -- i.e. the blank
line should be *above* the "CASCADE", not below it.

------

110. deparse_FunctionSet

+ obj = new_objtree("RESET");
+ append_string_object(obj, "%{set_name}I", name);

This can be combined as a single VA() call with a format "RESET %{set_name}I".

111.

+ if (kind == VAR_RESET_ALL)
+ {
+ obj = new_objtree("RESET ALL");
+ }
+ else if (value != NULL)

It seems a bit strange that the decision is judged sometimes by the
*value*. Why isn’t this just deciding according to different
VariableSetKind (e.g. VAR_SET_VALUE)

------

112. deparse_IndexStmt

+ indexStmt = new_objtree("CREATE");
+
+ append_string_object(indexStmt, "%{unique}s",
+ node->unique ? "UNIQUE" : "");
+
+ append_format_string(indexStmt, "INDEX");
+
+ append_string_object(indexStmt, "%{concurrently}s",
+ node->concurrent ? "CONCURRENTLY" : "");
+
+ append_string_object(indexStmt, "%{if_not_exists}s",
+ node->if_not_exists ? "IF NOT EXISTS" : "");
+
+ append_string_object(indexStmt, "%{name}I",
+ RelationGetRelationName(idxrel));
+
+ append_object_object(indexStmt, "ON %{table}D",
+ new_objtree_for_qualname(heaprel->rd_rel->relnamespace,
+ RelationGetRelationName(heaprel)));
+
+ append_string_object(indexStmt, "USING %{index_am}s", index_am);
+
+ append_string_object(indexStmt, "(%{definition}s)", definition);

This could all be combined to a single VA() function call.

------

113. deparse_OnCommitClause

+ case ONCOMMIT_NOOP:
+ append_null_object(oncommit, "%{on_commit_value}s");
+ append_bool_object(oncommit, "present", false);
+ break;

Why is the null object necessary when the entire "ON COMMIT" is present=false?

------

114. deparse_RenameStmt

+ renameStmt = new_objtree_VA(fmtstr, 0);
+ append_string_object(renameStmt, "%{if_exists}s",
+ node->missing_ok ? "IF EXISTS" : "");
+ append_object_object(renameStmt, "%{identity}D",
+ new_objtree_for_qualname(schemaId,
+ node->relation->relname));
+ append_string_object(renameStmt, "RENAME TO %{newname}I",
+ node->newname);

114a.
Don't use VA() for 0 args.

114b.
Anyway, all these can be combined to a single new_objtree_VA() call.

115.

+ renameStmt = new_objtree_VA("ALTER POLICY", 0);
+ append_string_object(renameStmt, "%{if_exists}s",
+ node->missing_ok ? "IF EXISTS" : "");
+ append_string_object(renameStmt, "%{policyname}I", node->subname);
+ append_object_object(renameStmt, "ON %{identity}D",
+ new_objtree_for_qualname_id(RelationRelationId,
+ polForm->polrelid));
+ append_string_object(renameStmt, "RENAME TO %{newname}I",
+ node->newname);

All these can be combined into a single VA() call.

116.

relation_close(pg_policy, AccessShareLock);

}
break;

case OBJECT_ATTRIBUTE:

Spurious blank line before the }

117.

+ objtype = stringify_objtype(node->relationType);
+ fmtstr = psprintf("ALTER %s", objtype);
+ renameStmt = new_objtree(fmtstr);

The code seems over-complicated. All these temporary assignments are
not really necessary.

Maybe better remove the psprintf anyway, as per my general comment at
top of this review post.

118.

+ relation_close(relation, AccessShareLock);
+
+ break;
+ case OBJECT_FUNCTION:

The misplaced blank line should be *after* the break; not before it.

119.

+ char *fmt;
+
+ fmt = psprintf("ALTER %s %%{identity}D USING %%{index_method}s
RENAME TO %%{newname}I",
+ stringify_objtype(node->renameType));

Let's be consistent about the variable naming at least within the same
function. Elsewhere was 'fmt' was 'fmtstr' so make them all the same
(pick one).

120.

+ objtype = stringify_objtype(node->renameType);
+ fmtstring = psprintf("ALTER %s", objtype);
+
+ renameStmt = new_objtree_VA(fmtstring,
+ 0);
+ append_object_object(renameStmt, "%{identity}D",
+ new_objtree_for_qualname(DatumGetObjectId(objnsp),
+ strVal(llast(identity))));
+
+ append_string_object(renameStmt, "RENAME TO %{newname}I",
+ node->newname);

120a.
Simplify this by not going the assignment to 'objtype'

120b.
All this can be combined as a single VA() call.

------

121. deparse_AlterDependStmt

+deparse_AlterDependStmt(Oid objectId, Node *parsetree)
+{
+ AlterObjectDependsStmt *node = (AlterObjectDependsStmt *) parsetree;
+ ObjTree *alterDependeStmt = NULL;
+
+
+ if (node->objectType == OBJECT_INDEX)

Double blank lines?

122.

+ alterDependeStmt = new_objtree("ALTER INDEX");
+
+ qualified = new_objtree_for_qualname(relation->rd_rel->relnamespace,
+ node->relation->relname);
+ append_object_object(alterDependeStmt, "%{identity}D", qualified);

This could be combined into a single VA() call.

In, fact everything could be if the code it refactored a bit better so
only the assignment for 'qualified' was within the lock.

SUGGESTION

qualified = new_objtree_for_qualname(relation->rd_rel->relnamespace,
node->relation->relname);
relation_close(relation, AccessShareLock);

stmt = new_objtree_VA("ALTER INDEX %{identity}D %{no}s DEPENDS
ON EXTENSION %{newname}I", 3,
"identity", ObjTypeObject, qualified,
"no", ObjTypeString, node->remove ? "NO" : "",
"newname", strVal(node->extname));

123.

+ append_string_object(alterDependeStmt, "%{NO}s",
+ node->remove ? "NO" : "");

IMO it seemed more conventional for the substition marker to be
lowercase. So this should say "%{no}s" instead.

124.

AlterObjectDependsStmt *node = (AlterObjectDependsStmt *) parsetree;
ObjTree *alterDependeStmt = NULL;

Why 'alterDependeStmt' with the extra 'e' -- Is it a typo? Anyway, the
name seems overkill - just 'stmt' would put be fine.

------

125. GENERAL comments for all the deparse_Seq_XXX functions

Comments common for:
- deparse_Seq_Cache
- deparse_Seq_Cycle
- deparse_Seq_IncrementBy
- deparse_Seq_Maxvalue
- deparse_Seq_Minvalue
- deparse_Seq_OwnedBy
- deparse_Seq_Restart
- deparse_Seq_Startwith

125a
Most of the deparse_Seq_XXX functions are prefixed with "SET" which is
needed for ALTER TABLE only.

e.g.

if (alter_table)
fmt = "SET %{no}s CYCLE";
else
fmt = "%{no}s CYCLE";

IMO all these "SET" additions should be done at the point of the call
when doing the ALTER TABLE instead of polluting all these helper
functions. Remove the alter_table function parameter.

125b.
IMO it would be tidier with a blank line before the returns.

125c.
The function parameter *parent is unused.

------

126. deparse_RuleStmt

+ ruleStmt = new_objtree("CREATE RULE");
+
+ append_string_object(ruleStmt, "%{or_replace}s",
+ node->replace ? "OR REPLACE" : "");
+
+ append_string_object(ruleStmt, "%{identity}I",
+ node->rulename);
+
+ append_string_object(ruleStmt, "AS ON %{event}s",
+ node->event == CMD_SELECT ? "SELECT" :
+ node->event == CMD_UPDATE ? "UPDATE" :
+ node->event == CMD_DELETE ? "DELETE" :
+ node->event == CMD_INSERT ? "INSERT" : "XXX");
+ append_object_object(ruleStmt, "TO %{table}D",
+ new_objtree_for_qualname_id(RelationRelationId,
+ rewrForm->ev_class));
+
+ append_string_object(ruleStmt, "DO %{instead}s",
+ node->instead ? "INSTEAD" : "ALSO");

I suspect all of this can be combined to be a single VA() function call.

127.

+ append_string_object(ruleStmt, "AS ON %{event}s",
+ node->event == CMD_SELECT ? "SELECT" :
+ node->event == CMD_UPDATE ? "UPDATE" :
+ node->event == CMD_DELETE ? "DELETE" :
+ node->event == CMD_INSERT ? "INSERT" : "XXX");

The bogus "XXX" looks a bit dodgy. Probably it would be better to
assign this 'event_str' separately and Assert/Error if node->event is
not supported.

128.

+ tmp = new_objtree_VA("WHERE %{clause}s", 0);
+
+ if (qual)
+ append_string_object(tmp, "clause", qual);
+ else
+ {
+ append_null_object(tmp, "clause");
+ append_bool_object(tmp, "present", false);
+ }
+
+ append_object_object(ruleStmt, "where_clause", tmp);

This doesn't look right to me...

128a.
Using VA() with 0 args

128b.
Using null object to fudge substitution to "%{clause}s, is avoidable IMO

128c.
Shouldn't there be a "%{where_clause}s" on the ruleStmt format?

------

129. deparse_CreateTransformStmt

+ createTransform = new_objtree("CREATE");
+
+ append_string_object(createTransform, "%{or_replace}s",
+ node->replace ? "OR REPLACE" : "");
+ append_object_object(createTransform, "TRANSFORM FOR %{typename}D",
+ new_objtree_for_qualname_id(TypeRelationId,
+ trfForm->trftype));
+ append_string_object(createTransform, "LANGUAGE %{language}I",
+ NameStr(langForm->lanname));

This can all be combined into a single VA() function.

130.
+ /* deparse the transform_element_list */
+ if (trfForm->trffromsql != InvalidOid)

130a.
Uppercase comment

130b.
Use OidIsValid macro.

131.

+ /*
+ * Verbose syntax
+ *
+ * CREATE %{or_replace}s TRANSFORM FOR %{typename}D LANGUAGE
+ * %{language}I ( FROM SQL WITH FUNCTION %{signature}s, TO SQL WITH
+ * FUNCTION %{signature_tof}s )
+ *
+ * OR
+ *
+ * CREATE %{or_replace}s TRANSFORM FOR %{typename}D LANGUAGE
+ * %{language}I ( TO SQL WITH FUNCTION %{signature_tof}s )
+ */
+

According to the PG DOCS [3] *either* part of FROM/TO SQL WITH
FUNCTION are optional. So a "FROM SQL" without a "TO SQL" is also
allowed. So the comment should say this too.

132.

There are multiple other places in this code where should use OidIsValid macro.

e.g.
+ if (trfForm->trftosql != InvalidOid)

e.g.
+ /* Append a ',' if trffromsql is present, else append '(' */
+ append_string_object(createTransform, "%{comma}s",
+ trfForm->trffromsql != InvalidOid ? "," : "(");

133.
These strange substitutions could've just use the
append_format_string() function I think.

133a
+ /* Append a ',' if trffromsql is present, else append '(' */
+ append_string_object(createTransform, "%{comma}s",
+ trfForm->trffromsql != InvalidOid ? "," : "(");

SUGGESTION
append_format_string(createTransform, OidIsValid( trfForm->trffromsql)
"," : "(");

133b.
+ append_string_object(createTransform, "%{close_bracket}s", ")");

SUGGESTION
append_format_string(createTransform, ")");

134.
+ sign = new_objtree("");
+
+ append_object_object(sign, "%{identity}D",
+ new_objtree_for_qualname(procForm->pronamespace,
+ NameStr(procForm->proname)));
+ append_array_object(sign, "(%{arguments:, }s)", params);
+
+ append_object_object(createTransform, "TO SQL WITH FUNCTION
%{signature_tof}s", sign);

134a.
IIUC it's a bit clunky to parse out this whole fmt looking for a '{'
to extract the name "signature_tof" (maybe it works but there is a lot
of ineficiency hidden under the covers I think), when with some small
refactoring this could be done using a VA() function passing in the
known name.

134b.
Looks like 'sign' is either a typo or very misleading name. Isn't that
supposed to be the ObjTree for the "signature_tof"?

------

135. append_literal_or_null

+static void
+append_literal_or_null(ObjTree *mainobj, char *elemname, char *value)

In other functions 'mainobj' would have been called 'parent'. I think
parent is a better name.

136.

+ top = new_objtree_VA("", 0);

Don't use VA() for 0 args.

137.

+ top = new_objtree_VA("", 0);
+ part = new_objtree_VA("NULL", 1,
+ "present", ObjTypeBool,
+ !value);
+ append_object_object(top, "%{null}s", part);
+ part = new_objtree_VA("", 1,
+ "present", ObjTypeBool,
+ !!value);
+ if (value)
+ append_string_object(part, "%{value}L", value);
+ append_object_object(top, "%{literal}s", part);

137a.
Suggest to put each VA arg name/value on the same line.
e.g.
+ part = new_objtree_VA("NULL", 1,
+ "present", ObjTypeBool, !value);

137b.
The '!!' is cute but seems uncommon technique in PG sources. Maybe
better just say value != NULL

137c.
Suggest adding a blank line to separate the logic of the 2 parts.
(e.g. above the 2nd part = new_objtree_VA).

------

138. deparse_CommentOnConstraintSmt

+ comment = new_objtree("COMMENT ON CONSTRAINT");
+
+ append_string_object(comment, "%{identity}s",
pstrdup(NameStr(constrForm->conname)));
+ append_format_string(comment, "ON");
+
+ if (node->objtype == OBJECT_DOMCONSTRAINT)
+ append_format_string(comment, "DOMAIN");
+
+ append_string_object(comment, "%{parentobj}s",
+ getObjectIdentity(&addr, false));

This can mostly be done as a single VA() call I think.

------

139. deparse_CommentStmt

+
+static ObjTree *
+deparse_CommentStmt(ObjectAddress address, Node *parsetree)

Missing function comment.

140.

+ comment = new_objtree("COMMENT ON");
+ append_string_object(comment, "%{objtype}s", (char *)
stringify_objtype(node->objtype));

A single VA() function call can do this.

------

141. deparse_CreateAmStmt

+
+static ObjTree *
+deparse_CreateAmStmt(Oid objectId, Node *parsetree)

Missing function comment.

142.

+ CreateAm = new_objtree("CREATE ACCESS METHOD");
+ append_string_object(CreateAm, "%{identity}I",
+ NameStr(amForm->amname));
+
+ switch (amForm->amtype)
+ {
+ case 'i':
+ amtype = "INDEX";
+ break;
+ case 't':
+ amtype = "TABLE";
+ break;
+ default:
+ elog(ERROR, "invalid type %c for access method", amForm->amtype);
+ }
+ append_string_object(CreateAm, "TYPE %{am_type}s", amtype);
+
+ /* Add the HANDLER clause */
+ append_object_object(CreateAm, "HANDLER %{handler}D",
+ new_objtree_for_qualname_id(ProcedureRelationId,
+ amForm->amhandler));

This entire thing can be done as a single VA() function call.

SUGGESTION

switch (amForm->amtype)
{
case 'i':
amtype = "INDEX";
break;
case 't':
amtype = "TABLE";
break;
default:
elog(ERROR, "invalid type %c for access method", amForm->amtype);
}

createAm = new_objtree_VA("CREATE ACCESS METHOD %{identity}I TYPE
%{am_type}s HANDLER %{handler}D", 3,
"identity", ObjTypeString, NameStr(amForm->amname),
"am_type", ObjTypeString, amtype,
"handler", ObjTypeObject,
new_objtree_for_qualname_id(ProcedureRelationId, amForm->amhandler));

------

143. deparse_simple_command

+ switch (nodeTag(parsetree))
+ {
+ case T_CreateSchemaStmt:
+ command = deparse_CreateSchemaStmt(objectId, parsetree);
+ break;
+
+ case T_AlterDomainStmt:
+ command = deparse_AlterDomainStmt(objectId, parsetree,
+ cmd->d.simple.secondaryObject);
+ break;
+
+ case T_CreateStmt:
+ command = deparse_CreateStmt(objectId, parsetree);
+ break;
+
+ case T_RefreshMatViewStmt:
+ command = deparse_RefreshMatViewStmt(objectId, parsetree);
+ break;
+
+ case T_CreateTrigStmt:
+ command = deparse_CreateTrigStmt(objectId, parsetree);
+ break;
+
+ case T_RuleStmt:
+ command = deparse_RuleStmt(objectId, parsetree);
+ break;
+
+ case T_CreatePLangStmt:
+ command = deparse_CreateLangStmt(objectId, parsetree);
+ break;
+
+ case T_CreateSeqStmt:
+ command = deparse_CreateSeqStmt(objectId, parsetree);
+ break;
+
+ case T_CreateFdwStmt:
+ command = deparse_CreateFdwStmt(objectId, parsetree);
+ break;
+
+ case T_CreateUserMappingStmt:
+ command = deparse_CreateUserMappingStmt(objectId, parsetree);
+ break;
+
+ case T_AlterUserMappingStmt:
+ command = deparse_AlterUserMappingStmt(objectId, parsetree);
+ break;
+
+ case T_AlterStatsStmt:
+ command = deparse_AlterStatsStmt(objectId, parsetree);
+ break;
+
+ case T_AlterFdwStmt:
+ command = deparse_AlterFdwStmt(objectId, parsetree);
+ break;
+
+ case T_AlterSeqStmt:
+ command = deparse_AlterSeqStmt(objectId, parsetree);
+ break;
+
+ case T_DefineStmt:
+ command = deparse_DefineStmt(objectId, parsetree,
+ cmd->d.simple.secondaryObject);
+ break;
+
+ case T_CreateConversionStmt:
+ command = deparse_CreateConversion(objectId, parsetree);
+ break;
+
+ case T_CreateDomainStmt:
+ command = deparse_CreateDomain(objectId, parsetree);
+ break;
+
+ case T_CreateExtensionStmt:
+ command = deparse_CreateExtensionStmt(objectId, parsetree);
+ break;
+
+ case T_AlterExtensionStmt:
+ command = deparse_AlterExtensionStmt(objectId, parsetree);
+ break;
+
+ case T_AlterExtensionContentsStmt:
+ command = deparse_AlterExtensionContentsStmt(objectId, parsetree,
+ cmd->d.simple.secondaryObject);
+ break;
+
+ case T_CreateOpFamilyStmt:
+ command = deparse_CreateOpFamily(objectId, parsetree);
+ break;
+
+ case T_CreatePolicyStmt:
+ command = deparse_CreatePolicyStmt(objectId, parsetree);
+ break;
+
+ case T_IndexStmt:
+ command = deparse_IndexStmt(objectId, parsetree);
+ break;
+
+ case T_CreateFunctionStmt:
+ command = deparse_CreateFunction(objectId, parsetree);
+ break;
+
+ case T_AlterFunctionStmt:
+ command = deparse_AlterFunction(objectId, parsetree);
+ break;
+
+ case T_AlterCollationStmt:
+ command = deparse_AlterCollation(objectId, parsetree);
+ break;
+
+ case T_RenameStmt:
+ command = deparse_RenameStmt(cmd->d.simple.address, parsetree);
+ break;
+
+ case T_AlterObjectDependsStmt:
+ command = deparse_AlterDependStmt(objectId, parsetree);
+ break;
+
+ case T_AlterObjectSchemaStmt:
+ command = deparse_AlterObjectSchemaStmt(cmd->d.simple.address,
+ parsetree,
+ cmd->d.simple.secondaryObject);
+ break;
+
+ case T_AlterOwnerStmt:
+ command = deparse_AlterOwnerStmt(cmd->d.simple.address, parsetree);
+ break;
+
+ case T_AlterOperatorStmt:
+ command = deparse_AlterOperatorStmt(objectId, parsetree);
+ break;
+
+ case T_AlterPolicyStmt:
+ command = deparse_AlterPolicyStmt(objectId, parsetree);
+ break;
+
+ case T_AlterTypeStmt:
+ command = deparse_AlterTypeSetStmt(objectId, parsetree);
+ break;
+
+ case T_CreateStatsStmt:
+ command = deparse_CreateStatisticsStmt(objectId, parsetree);
+ break;
+
+ case T_CreateForeignServerStmt:
+ command = deparse_CreateForeignServerStmt(objectId, parsetree);
+ break;
+
+ case T_AlterForeignServerStmt:
+ command = deparse_AlterForeignServerStmt(objectId, parsetree);
+ break;
+
+ case T_CompositeTypeStmt:
+ command = deparse_CompositeTypeStmt(objectId, parsetree);
+ break;
+
+ case T_CreateEnumStmt: /* CREATE TYPE AS ENUM */
+ command = deparse_CreateEnumStmt(objectId, parsetree);
+ break;
+
+ case T_CreateRangeStmt: /* CREATE TYPE AS RANGE */
+ command = deparse_CreateRangeStmt(objectId, parsetree);
+ break;
+
+ case T_AlterEnumStmt:
+ command = deparse_AlterEnumStmt(objectId, parsetree);
+ break;
+
+ case T_CreateCastStmt:
+ command = deparse_CreateCastStmt(objectId, parsetree);
+ break;
+
+ case T_AlterTSDictionaryStmt:
+ command = deparse_AlterTSDictionaryStmt(objectId, parsetree);
+ break;
+
+ case T_CreateTransformStmt:
+ command = deparse_CreateTransformStmt(objectId, parsetree);
+ break;
+
+ case T_ViewStmt: /* CREATE VIEW */
+ command = deparse_ViewStmt(objectId, parsetree);
+ break;
+
+ case T_CreateTableAsStmt: /* CREATE MATERIALIZED VIEW */
+ command = deparse_CreateTableAsStmt_vanilla(objectId, parsetree);
+ break;
+
+ case T_CommentStmt:
+ command = deparse_CommentStmt(cmd->d.simple.address, parsetree);
+ break;
+
+ case T_CreateAmStmt:
+ command = deparse_CreateAmStmt(objectId, parsetree);
+ break;

143a.
Suggestion to put all these cases in alphabetical order.

143b.
Suggest removing the variable 'command' and for each case just return
the deparse_XXX result -- doing this will eliminate the need for
"break;" and so the function can be 50 lines shorter.

------

144. deparse_TableElements

+ if (tree != NULL)
+ {
+ ObjElem *column;
+
+ column = new_object_object(tree);
+ elements = lappend(elements, column);
+ }

Why do all this instead of just:

if (tree != NULL)
elements = lappend(elements, new_object_object(tree));

------

145. deparse_utility_command

+ if (tree)
+ {
+ Jsonb *jsonb;
+
+ jsonb = objtree_to_jsonb(tree);
+ command = JsonbToCString(&str, &jsonb->root, JSONB_ESTIMATED_LEN);
+ }
+ else
+ command = NULL;

145a.
Since 'tree' is always assigned the result of deparse_XXX I am
wondering if tree == NULL is even possible here? If not then this
if/else should be an Assert instead.

145b.
Anyway, maybe assign default command = NULL in the declaration to
reduce a couple of lines of unnecessary code.

------
[3] CREATE TRANSFORM -
https://www.postgresql.org/docs/current/sql-createtransform.html

Kind Regards,
Peter Smith.
Fujitsu Australia

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-11-11 14:39:17
Message-ID:	CALDaNm06V5EE43o3LOoqx8MEvBuhNwGp1jqUUKQ13e+L-ppDNw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, 4 Nov 2022 at 15:06, vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Wed, 2 Nov 2022 at 05:13, vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > On Mon, 31 Oct 2022 at 16:17, vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > >
> > > On Thu, 27 Oct 2022 at 16:02, vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > > >
> > > > On Thu, 27 Oct 2022 at 02:09, Zheng Li <zhengli10(at)gmail(dot)com> wrote:
> > > > >
> > > > > > Adding support for deparsing of CREATE/ALTER/DROP LANGUAGE for ddl replication.
> > > > >
> > > > > Adding support for deparsing of:
> > > > > COMMENT
> > > > > ALTER DEFAULT PRIVILEGES
> > > > > CREATE/DROP ACCESS METHOD
> > > >
> > > > Adding support for deparsing of:
> > > > ALTER/DROP ROUTINE
> > > >
> > > > The patch also includes fixes for the following issues:
> > >
> >
> Few comments:
> 1) If the user has specified a non-existing object, then we will throw
> the wrong error.
> +Datum
> +publication_deparse_ddl_command_start(PG_FUNCTION_ARGS)
> +{
> + EventTriggerData *trigdata;
> + char *command = psprintf("Drop table command start");
> + DropStmt *stmt;
> + ListCell *cell1;
> +
> + if (!CALLED_AS_EVENT_TRIGGER(fcinfo))
> + elog(ERROR, "not fired by event trigger manager");
> +
> + trigdata = (EventTriggerData *) fcinfo->context;
> + stmt = (DropStmt *) trigdata->parsetree;
> +
> + /* extract the relid from the parse tree */
> + foreach(cell1, stmt->objects)
> + {
> + char relpersist;
> + Node *object = lfirst(cell1);
> + ObjectAddress address;
> + Relation relation = NULL;
> +
> + address = get_object_address(stmt->removeType,
> + object,
> +
> &relation,
> +
> AccessExclusiveLock,
> + true);
> +
> + relpersist = get_rel_persistence(address.objectId);
>
> We could check relation is NULL after getting address and skip
> processing that object

Modified

> 2) Materialized view handling is missing:
> + switch (rel->rd_rel->relkind)
> + {
> + case RELKIND_RELATION:
> + case RELKIND_PARTITIONED_TABLE:
> + reltype = "TABLE";
> + break;
> + case RELKIND_INDEX:
> + case RELKIND_PARTITIONED_INDEX:
> + reltype = "INDEX";
> + break;
> + case RELKIND_VIEW:
> + reltype = "VIEW";
> + break;
> + case RELKIND_COMPOSITE_TYPE:
> + reltype = "TYPE";
> + istype = true;
> + break;
>
> We could use this scenario for debugging and verifying:
> ALTER MATERIALIZED VIEW testschema.amv SET TABLESPACE regress_tblspace;

Modified

> 3) Readdition of alter table readd statistics is not handled:
>
> + case AT_DropIdentity:
> + tmpobj = new_objtree_VA("ALTER COLUMN
> %{column}I DROP IDENTITY", 2,
> +
> "type", ObjTypeString, "drop identity",
> +
> "column", ObjTypeString, subcmd->name);
> +
> + append_string_object(tmpobj,
> "%{if_not_exists}s",
> +
> subcmd->missing_ok ? "IF EXISTS" : "");
> +
> + subcmds = lappend(subcmds,
> new_object_object(tmpobj));
> + break;
> + default:
> + elog(WARNING, "unsupported alter table
> subtype %d",
> + subcmd->subtype);
> + break;
> + }
>
>
> We could use this scenario for debugging and verifying:
> CREATE TABLE functional_dependencies (
> filler1 TEXT,
> filler2 NUMERIC,
> a INT,
> b TEXT,
> filler3 DATE,
> c INT,
> d TEXT
> )
> WITH (autovacuum_enabled = off);
> CREATE STATISTICS func_deps_stat (dependencies) ON a, b, c FROM
> functional_dependencies;
> TRUNCATE functional_dependencies;
> ALTER TABLE functional_dependencies ALTER COLUMN c TYPE numeric;

Modified

> 4) "Alter sequence as" option not hanlded
>
> + if (strcmp(elem->defname, "cache") == 0)
> + newelm = deparse_Seq_Cache(alterSeq, seqform, false);
> + else if (strcmp(elem->defname, "cycle") == 0)
> + newelm = deparse_Seq_Cycle(alterSeq, seqform, false);
> + else if (strcmp(elem->defname, "increment") == 0)
> + newelm = deparse_Seq_IncrementBy(alterSeq,
> seqform, false);
> + else if (strcmp(elem->defname, "minvalue") == 0)
> + newelm = deparse_Seq_Minvalue(alterSeq, seqform, false);
> + else if (strcmp(elem->defname, "maxvalue") == 0)
> + newelm = deparse_Seq_Maxvalue(alterSeq, seqform, false);
> + else if (strcmp(elem->defname, "start") == 0)
> + newelm = deparse_Seq_Startwith(alterSeq,
> seqform, false);
> + else if (strcmp(elem->defname, "restart") == 0)
> + newelm = deparse_Seq_Restart(alterSeq, seqdata);
> + else if (strcmp(elem->defname, "owned_by") == 0)
> + newelm = deparse_Seq_OwnedBy(alterSeq, objectId, false);
> + else
> + elog(ERROR, "invalid sequence option %s",
> elem->defname);
>
> We could use this scenario for debugging and verifying:
> ALTER SEQUENCE seq1 AS smallint;

Modified

> 5) alter table row level security is not handled:
>
> + case AT_DropIdentity:
> + tmpobj = new_objtree_VA("ALTER COLUMN
> %{column}I DROP IDENTITY", 2,
> +
> "type", ObjTypeString, "drop identity",
> +
> "column", ObjTypeString, subcmd->name);
> +
> + append_string_object(tmpobj,
> "%{if_not_exists}s",
> +
> subcmd->missing_ok ? "IF EXISTS" : "");
> +
> + subcmds = lappend(subcmds,
> new_object_object(tmpobj));
> + break;
> + default:
> + elog(WARNING, "unsupported alter table
> subtype %d",
> + subcmd->subtype);
> + break;
>
> We could use this scenario for debugging and verifying:
> CREATE TABLE r1 (a int);
> ALTER TABLE r1 FORCE ROW LEVEL SECURITY;

Modified

> 6) alter table add primary key is not handled:
>
> + case AT_DropIdentity:
> + tmpobj = new_objtree_VA("ALTER COLUMN
> %{column}I DROP IDENTITY", 2,
> +
> "type", ObjTypeString, "drop identity",
> +
> "column", ObjTypeString, subcmd->name);
> +
> + append_string_object(tmpobj,
> "%{if_not_exists}s",
> +
> subcmd->missing_ok ? "IF EXISTS" : "");
> +
> + subcmds = lappend(subcmds,
> new_object_object(tmpobj));
> + break;
> + default:
> + elog(WARNING, "unsupported alter table
> subtype %d",
> + subcmd->subtype);
> + break;
>
> We could use this scenario for debugging and verifying:
> create table idxpart (a int) partition by range (a);
> create table idxpart0 (like idxpart);
> alter table idxpart0 add primary key (a);
> alter table idxpart attach partition idxpart0 for values from (0) to (1000);
> alter table only idxpart add primary key (a);

Modified

> 7) Code not updated based on new change:
>
> 7.a) identity_column should be removed from new_objtree_VA
> + case AT_AddIdentity:
> + {
> + AttrNumber attnum;
> + Oid seq_relid;
> + ObjTree *seqdef;
> + ColumnDef *coldef =
> (ColumnDef *) subcmd->def;
> +
> + tmpobj = new_objtree_VA("ALTER
> COLUMN %{column}I ADD %{identity_column}s", 2,
> +
> "type", ObjTypeString, "add identity",
> +
> "column", ObjTypeString, subcmd->name);
> +
> + attnum =
> get_attnum(RelationGetRelid(rel), subcmd->name);
> + seq_relid =
> getIdentitySequence(RelationGetRelid(rel), attnum, true);
> + seqdef =
> deparse_ColumnIdentity(seq_relid, coldef->identity, false);
> +
> + append_object_object(tmpobj,
> "identity_column", seqdef);
>
> 7.b) identity_column should be changed to "%{identity_column}s" in
> append_object_object
>
> We could use this scenario for debugging and verifying:
> CREATE TABLE itest4 (a int NOT NULL, b text);
> ALTER TABLE itest4 ALTER COLUMN a ADD GENERATED ALWAYS AS IDENTITY;

Modified
> 8) SearchSysCache1 copied twice, one of it should be removed
> + /*
> + * Lookup up object in the catalog, so we don't have to deal with
> + * current_user and such.
> + */
> +
> + tp = SearchSysCache1(USERMAPPINGOID, ObjectIdGetDatum(objectId));
> + if (!HeapTupleIsValid(tp))
> + elog(ERROR, "cache lookup failed for user mapping %u",
> objectId);
> +
> + form = (Form_pg_user_mapping) GETSTRUCT(tp);
> +
> + /*
> + * Lookup up object in the catalog, so we don't have to deal with
> + * current_user and such.
> + */
> +
> + tp = SearchSysCache1(USERMAPPINGOID, ObjectIdGetDatum(objectId));
> + if (!HeapTupleIsValid(tp))
> + elog(ERROR, "cache lookup failed for user mapping %u",
> objectId);

Modified

> 9) Create table with INCLUDING GENERATED not handled:
> + case AT_DropIdentity:
> + tmpobj = new_objtree_VA("ALTER COLUMN
> %{column}I DROP IDENTITY", 2,
> +
> "type", ObjTypeString, "drop identity",
> +
> "column", ObjTypeString, subcmd->name);
> +
> + append_string_object(tmpobj,
> "%{if_not_exists}s",
> +
> subcmd->missing_ok ? "IF EXISTS" : "");
> +
> + subcmds = lappend(subcmds,
> new_object_object(tmpobj));
> + break;
> + default:
> + elog(WARNING, "unsupported alter table
> subtype %d",
> + subcmd->subtype);
> + break;
>
> We could use this scenario for debugging and verifying:
> CREATE TABLE gtest28a (a int, b int, c int, x int GENERATED ALWAYS
> AS (b * 2) STORED);
> CREATE TABLE gtest28b (LIKE gtest28a INCLUDING GENERATED);

Modified

The attached v36 patch has the changes for the same.

Regards,
Vignesh

Attachment	Content-Type	Size
v36-0004-Test-cases-for-DDL-replication.patch	text/x-patch	24.6 KB
v36-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	text/x-patch	15.0 KB
v36-0002-Support-DDL-replication.patch	text/x-patch	133.5 KB
v36-0001-Functions-to-deparse-DDL-commands.patch	text/x-patch	318.7 KB

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-11-14 06:33:18
Message-ID:	CALDaNm0arv0uo_gP5uco7k1HW1PRg6brvtjSRSsOjjkcUHe68Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, 11 Nov 2022 at 20:09, vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Fri, 4 Nov 2022 at 15:06, vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > On Wed, 2 Nov 2022 at 05:13, vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > >
> > > On Mon, 31 Oct 2022 at 16:17, vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > > >
> > > > On Thu, 27 Oct 2022 at 16:02, vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Thu, 27 Oct 2022 at 02:09, Zheng Li <zhengli10(at)gmail(dot)com> wrote:
> > > > > >
> > > > > > > Adding support for deparsing of CREATE/ALTER/DROP LANGUAGE for ddl replication.
> > > > > >
> > > > > > Adding support for deparsing of:
> > > > > > COMMENT
> > > > > > ALTER DEFAULT PRIVILEGES
> > > > > > CREATE/DROP ACCESS METHOD
> > > > >
> > > > > Adding support for deparsing of:
> > > > > ALTER/DROP ROUTINE
> > > > >
> > > > > The patch also includes fixes for the following issues:
> > > >
> > >
> > Few comments:
> > 1) If the user has specified a non-existing object, then we will throw
> > the wrong error.
> > +Datum
> > +publication_deparse_ddl_command_start(PG_FUNCTION_ARGS)
> > +{
> > + EventTriggerData *trigdata;
> > + char *command = psprintf("Drop table command start");
> > + DropStmt *stmt;
> > + ListCell *cell1;
> > +
> > + if (!CALLED_AS_EVENT_TRIGGER(fcinfo))
> > + elog(ERROR, "not fired by event trigger manager");
> > +
> > + trigdata = (EventTriggerData *) fcinfo->context;
> > + stmt = (DropStmt *) trigdata->parsetree;
> > +
> > + /* extract the relid from the parse tree */
> > + foreach(cell1, stmt->objects)
> > + {
> > + char relpersist;
> > + Node *object = lfirst(cell1);
> > + ObjectAddress address;
> > + Relation relation = NULL;
> > +
> > + address = get_object_address(stmt->removeType,
> > + object,
> > +
> > &relation,
> > +
> > AccessExclusiveLock,
> > + true);
> > +
> > + relpersist = get_rel_persistence(address.objectId);
> >
> > We could check relation is NULL after getting address and skip
> > processing that object
>
> Modified
>
> > 2) Materialized view handling is missing:
> > + switch (rel->rd_rel->relkind)
> > + {
> > + case RELKIND_RELATION:
> > + case RELKIND_PARTITIONED_TABLE:
> > + reltype = "TABLE";
> > + break;
> > + case RELKIND_INDEX:
> > + case RELKIND_PARTITIONED_INDEX:
> > + reltype = "INDEX";
> > + break;
> > + case RELKIND_VIEW:
> > + reltype = "VIEW";
> > + break;
> > + case RELKIND_COMPOSITE_TYPE:
> > + reltype = "TYPE";
> > + istype = true;
> > + break;
> >
> > We could use this scenario for debugging and verifying:
> > ALTER MATERIALIZED VIEW testschema.amv SET TABLESPACE regress_tblspace;
>
> Modified
>
> > 3) Readdition of alter table readd statistics is not handled:
> >
> > + case AT_DropIdentity:
> > + tmpobj = new_objtree_VA("ALTER COLUMN
> > %{column}I DROP IDENTITY", 2,
> > +
> > "type", ObjTypeString, "drop identity",
> > +
> > "column", ObjTypeString, subcmd->name);
> > +
> > + append_string_object(tmpobj,
> > "%{if_not_exists}s",
> > +
> > subcmd->missing_ok ? "IF EXISTS" : "");
> > +
> > + subcmds = lappend(subcmds,
> > new_object_object(tmpobj));
> > + break;
> > + default:
> > + elog(WARNING, "unsupported alter table
> > subtype %d",
> > + subcmd->subtype);
> > + break;
> > + }
> >
> >
> > We could use this scenario for debugging and verifying:
> > CREATE TABLE functional_dependencies (
> > filler1 TEXT,
> > filler2 NUMERIC,
> > a INT,
> > b TEXT,
> > filler3 DATE,
> > c INT,
> > d TEXT
> > )
> > WITH (autovacuum_enabled = off);
> > CREATE STATISTICS func_deps_stat (dependencies) ON a, b, c FROM
> > functional_dependencies;
> > TRUNCATE functional_dependencies;
> > ALTER TABLE functional_dependencies ALTER COLUMN c TYPE numeric;
>
> Modified
>
> > 4) "Alter sequence as" option not hanlded
> >
> > + if (strcmp(elem->defname, "cache") == 0)
> > + newelm = deparse_Seq_Cache(alterSeq, seqform, false);
> > + else if (strcmp(elem->defname, "cycle") == 0)
> > + newelm = deparse_Seq_Cycle(alterSeq, seqform, false);
> > + else if (strcmp(elem->defname, "increment") == 0)
> > + newelm = deparse_Seq_IncrementBy(alterSeq,
> > seqform, false);
> > + else if (strcmp(elem->defname, "minvalue") == 0)
> > + newelm = deparse_Seq_Minvalue(alterSeq, seqform, false);
> > + else if (strcmp(elem->defname, "maxvalue") == 0)
> > + newelm = deparse_Seq_Maxvalue(alterSeq, seqform, false);
> > + else if (strcmp(elem->defname, "start") == 0)
> > + newelm = deparse_Seq_Startwith(alterSeq,
> > seqform, false);
> > + else if (strcmp(elem->defname, "restart") == 0)
> > + newelm = deparse_Seq_Restart(alterSeq, seqdata);
> > + else if (strcmp(elem->defname, "owned_by") == 0)
> > + newelm = deparse_Seq_OwnedBy(alterSeq, objectId, false);
> > + else
> > + elog(ERROR, "invalid sequence option %s",
> > elem->defname);
> >
> > We could use this scenario for debugging and verifying:
> > ALTER SEQUENCE seq1 AS smallint;
>
> Modified
>
> > 5) alter table row level security is not handled:
> >
> > + case AT_DropIdentity:
> > + tmpobj = new_objtree_VA("ALTER COLUMN
> > %{column}I DROP IDENTITY", 2,
> > +
> > "type", ObjTypeString, "drop identity",
> > +
> > "column", ObjTypeString, subcmd->name);
> > +
> > + append_string_object(tmpobj,
> > "%{if_not_exists}s",
> > +
> > subcmd->missing_ok ? "IF EXISTS" : "");
> > +
> > + subcmds = lappend(subcmds,
> > new_object_object(tmpobj));
> > + break;
> > + default:
> > + elog(WARNING, "unsupported alter table
> > subtype %d",
> > + subcmd->subtype);
> > + break;
> >
> > We could use this scenario for debugging and verifying:
> > CREATE TABLE r1 (a int);
> > ALTER TABLE r1 FORCE ROW LEVEL SECURITY;
>
> Modified
>
> > 6) alter table add primary key is not handled:
> >
> > + case AT_DropIdentity:
> > + tmpobj = new_objtree_VA("ALTER COLUMN
> > %{column}I DROP IDENTITY", 2,
> > +
> > "type", ObjTypeString, "drop identity",
> > +
> > "column", ObjTypeString, subcmd->name);
> > +
> > + append_string_object(tmpobj,
> > "%{if_not_exists}s",
> > +
> > subcmd->missing_ok ? "IF EXISTS" : "");
> > +
> > + subcmds = lappend(subcmds,
> > new_object_object(tmpobj));
> > + break;
> > + default:
> > + elog(WARNING, "unsupported alter table
> > subtype %d",
> > + subcmd->subtype);
> > + break;
> >
> > We could use this scenario for debugging and verifying:
> > create table idxpart (a int) partition by range (a);
> > create table idxpart0 (like idxpart);
> > alter table idxpart0 add primary key (a);
> > alter table idxpart attach partition idxpart0 for values from (0) to (1000);
> > alter table only idxpart add primary key (a);
>
> Modified
>
> > 7) Code not updated based on new change:
> >
> > 7.a) identity_column should be removed from new_objtree_VA
> > + case AT_AddIdentity:
> > + {
> > + AttrNumber attnum;
> > + Oid seq_relid;
> > + ObjTree *seqdef;
> > + ColumnDef *coldef =
> > (ColumnDef *) subcmd->def;
> > +
> > + tmpobj = new_objtree_VA("ALTER
> > COLUMN %{column}I ADD %{identity_column}s", 2,
> > +
> > "type", ObjTypeString, "add identity",
> > +
> > "column", ObjTypeString, subcmd->name);
> > +
> > + attnum =
> > get_attnum(RelationGetRelid(rel), subcmd->name);
> > + seq_relid =
> > getIdentitySequence(RelationGetRelid(rel), attnum, true);
> > + seqdef =
> > deparse_ColumnIdentity(seq_relid, coldef->identity, false);
> > +
> > + append_object_object(tmpobj,
> > "identity_column", seqdef);
> >
> > 7.b) identity_column should be changed to "%{identity_column}s" in
> > append_object_object
> >
> > We could use this scenario for debugging and verifying:
> > CREATE TABLE itest4 (a int NOT NULL, b text);
> > ALTER TABLE itest4 ALTER COLUMN a ADD GENERATED ALWAYS AS IDENTITY;
>
> Modified
> > 8) SearchSysCache1 copied twice, one of it should be removed
> > + /*
> > + * Lookup up object in the catalog, so we don't have to deal with
> > + * current_user and such.
> > + */
> > +
> > + tp = SearchSysCache1(USERMAPPINGOID, ObjectIdGetDatum(objectId));
> > + if (!HeapTupleIsValid(tp))
> > + elog(ERROR, "cache lookup failed for user mapping %u",
> > objectId);
> > +
> > + form = (Form_pg_user_mapping) GETSTRUCT(tp);
> > +
> > + /*
> > + * Lookup up object in the catalog, so we don't have to deal with
> > + * current_user and such.
> > + */
> > +
> > + tp = SearchSysCache1(USERMAPPINGOID, ObjectIdGetDatum(objectId));
> > + if (!HeapTupleIsValid(tp))
> > + elog(ERROR, "cache lookup failed for user mapping %u",
> > objectId);
>
> Modified
>
> > 9) Create table with INCLUDING GENERATED not handled:
> > + case AT_DropIdentity:
> > + tmpobj = new_objtree_VA("ALTER COLUMN
> > %{column}I DROP IDENTITY", 2,
> > +
> > "type", ObjTypeString, "drop identity",
> > +
> > "column", ObjTypeString, subcmd->name);
> > +
> > + append_string_object(tmpobj,
> > "%{if_not_exists}s",
> > +
> > subcmd->missing_ok ? "IF EXISTS" : "");
> > +
> > + subcmds = lappend(subcmds,
> > new_object_object(tmpobj));
> > + break;
> > + default:
> > + elog(WARNING, "unsupported alter table
> > subtype %d",
> > + subcmd->subtype);
> > + break;
> >
> > We could use this scenario for debugging and verifying:
> > CREATE TABLE gtest28a (a int, b int, c int, x int GENERATED ALWAYS
> > AS (b * 2) STORED);
> > CREATE TABLE gtest28b (LIKE gtest28a INCLUDING GENERATED);
>
> Modified
>
> The attached v36 patch has the changes for the same.

CFBot reported an issue with the patch, the updated patch has the
changes for the same.

Regards,
Vignesh

Attachment	Content-Type	Size
v36-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	text/x-patch	15.0 KB
v36-0004-Test-cases-for-DDL-replication.patch	text/x-patch	24.6 KB
v36-0002-Support-DDL-replication.patch	text/x-patch	133.5 KB
v36-0001-Functions-to-deparse-DDL-commands.patch	text/x-patch	318.8 KB

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-11-14 23:57:08
Message-ID:	CAHut+PsERMFwO8oK3LFH_3CRG+512T+ay_viWzrgNetbH2MwxA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Here are some review comments for v32-0002

======

1. Commit message

Comment says:
While creating a publication, we register a command end
trigger that deparses the DDL as a JSON blob, and WAL logs it. The event
trigger is automatically removed at the time of drop publication.

SUGGESTION (uppercase the SQL)
During CREATE PUBLICATION we register a command end trigger that
deparses the DDL as a JSON blob, and WAL logs it. The event
trigger is automatically removed at the time of DROP PUBLICATION.

~~~

Comment says:
This is a POC patch to show how using event triggers and DDL deparsing
facilities we can implement DDL replication. So, the implementation is
restricted to CREATE TABLE/ALTER TABLE/DROP TABLE commands.

Still correct or old comment gone stale?

~~~

Comment says:
Note that the replication for ALTER INDEX command is still under
progress.

Still correct or old comment gone stale?

======

4. GENERAL - Patch order.

Somehow, I feel this v32-0002 patch and the v32-0001 patch should be
swapped. IIUC this one seems to me to be the "core" framework for the
DDL message replication but the other 0001 was more like just the
implements of all the supported different *kinds* of DDL JSON blobs.
So actually this patch seems more like the mandatory one and the other
one can just evolve as it gets more supported JSON.

~~~

5. GENERAL - naming

The DDL suffix 'msg' or 'message' seemed sometimes unnecessary because
there is no ambiguity that this is a message for DDL replication, so
the shorter name conveys the same amount of information, doesn't it?

e.g. Maybe reconsider some of these ones (probably there are others)...

src/include/replication/decode.h
logicalddlmsg_decode -> Why not call this function logicalddl_decode?

src/include/replication/logicalproto.h:
LOGICAL_REP_MSG_DDLMESSAGE -> Why not call it 'LOGICAL_REP_MSG_DDL'?
logicalrep_write_ddlmessage -> Why not call this function logicalrep_write_ddl?
logicalrep_read_ddlmessage -> Why not call this function logicalrep_read_ddl?

src/include/replication/output_plugin.h:
'ddlmessage_cb' -> Why not call it 'ddl_cb'?
'stream_ddlmessage_cb' -> Why not call it 'stream_ddl_cb'?

src/include/replication/reorderbuffer.h:
- 'REORDER_BUFFER_CHANGE_DDL' --> Why not call it 'REORDER_BUFFER_CHANGE_DDL'?
- 'ddlmsg' -> Why not call it 'ddl'?
- 'ddlmessage' -> Why not call it 'ddl'?
- 'stream_ddlmessage' -> Why not call it 'stream_ddl'?

======

src/backend/access/rmgrdesc/Makefile

@@ -19,6 +19,7 @@ OBJS = \
hashdesc.o \
heapdesc.o \
logicalmsgdesc.o \
+ logicalddlmsgdesc.o \

Change should be in alphabetical order.

======

src/backend/access/rmgrdesc/logicalddlmsgdesc.c

7. logicalddlmsg_identify

+const char *
+logicalddlmsg_identify(uint8 info)
+{
+ if ((info & ~XLR_INFO_MASK) == XLOG_LOGICAL_DDL_MESSAGE)
+ return "DDL MESSAGE";
+
+ return NULL;
+}

The logicalrep_message_type (see below) said "DDL", so maybe this
should also just say "DDL" instead of "DDL MESSAGE"

@@ -1218,6 +1264,8 @@ logicalrep_message_type(LogicalRepMsgType action)
return "TYPE";
case LOGICAL_REP_MSG_MESSAGE:
return "MESSAGE";
+ case LOGICAL_REP_MSG_DDLMESSAGE:
+ return "DDL";

======

src/backend/commands/event_trigger.c

8. start/end

+/*
+ * publication_deparse_ddl_command_start
+ *
+ * Deparse the ddl command and log it.
+ */
+Datum
+publication_deparse_ddl_command_start(PG_FUNCTION_ARGS)
...
+/*
+ * publication_deparse_ddl_command_end
+ *
+ * Deparse the ddl command and log it.
+ */
+Datum
+publication_deparse_ddl_command_end(PG_FUNCTION_ARGS)

The start/end function comments are the same -- there should be some
more explanation to say what they are for.

~~~

9. publication_deparse_ddl_command_start

+ char *command = psprintf("Drop table command start");

Huh? So this function is only for this specific case of DROP TABLE? If
correct, then I think that should be commented on or asserted
somewhere.

10.

+ /* extract the relid from the parse tree */
+ foreach(cell1, stmt->objects)

Uppercase comment

11.

+ if (relpersist == RELPERSISTENCE_TEMP)
+ {
+ table_close(relation, NoLock);
+ continue;
+ }
+
+ LogLogicalDDLMessage("deparse", address.objectId, DCT_TableDropStart,
+ command, strlen(command) + 1);
+
+ if (relation)
+ table_close(relation, NoLock);

This code looks overly complex. Can't it just be like below?

SUGGESTION

if (relpersist != RELPERSISTENCE_TEMP)
LogLogicalDDLMessage("deparse", address.objectId, DCT_TableDropStart,
command, strlen(command) + 1);

if (relation)
table_close(relation, NoLock);

~~~

12. publication_deparse_table_rewrite

+ if (relpersist == RELPERSISTENCE_TEMP)
+ return PointerGetDatum(NULL);
+
+ /* Deparse the DDL command and WAL log it to allow decoding of the same. */
+ json_string = deparse_utility_command(cmd, false);
+
+ if (json_string != NULL)
+ LogLogicalDDLMessage("deparse", cmd->d.alterTable.objectId, DCT_TableAlter,
+ json_string, strlen(json_string) + 1);
+
+ return PointerGetDatum(NULL);

Similar to previous comment I think this can be simplified so there is
only one return

SUGGESTION

if (relpersist != RELPERSISTENCE_TEMP)
{
/* Deparse the DDL command and WAL log it to allow decoding of the same. */
json_string = deparse_utility_command(cmd, false);

if (json_string != NULL)
LogLogicalDDLMessage("deparse", cmd->d.alterTable.objectId, DCT_TableAlter,
json_string, strlen(json_string) + 1);
}

return PointerGetDatum(NULL);

~~~

13. publication_deparse_ddl_command_end

+ if (relpersist == RELPERSISTENCE_TEMP)
+ continue;
+
+ /*
+ * Deparse the DDL command and WAL log it to allow decoding of the
+ * same.
+ */
+ json_string = deparse_utility_command(cmd, false);
+
+ if (json_string == NULL)
+ continue;
+
+ LogLogicalDDLMessage("deparse", relid, type, json_string,
+ strlen(json_string) + 1);

Maybe this logic is simpler without all the continue?

SUGGESTION

if (relpersist != RELPERSISTENCE_TEMP)
{
/*
* Deparse the DDL command and WAL log it to allow decoding of the
* same.
*/
json_string = deparse_utility_command(cmd, false);

if (json_string != NULL)
LogLogicalDDLMessage("deparse", relid, type, json_string,
strlen(json_string) + 1);
}

14.

+ if (strcmp(obj->objecttype, "table") == 0)
+ cmdtype = DCT_TableDropEnd;
+ else if (strcmp(obj->objecttype, "sequence") == 0 ||
+ strcmp(obj->objecttype, "schema") == 0 ||
+ strcmp(obj->objecttype, "index") == 0 ||
+ strcmp(obj->objecttype, "function") == 0 ||
+ strcmp(obj->objecttype, "procedure") == 0 ||
+ strcmp(obj->objecttype, "operator") == 0 ||
+ strcmp(obj->objecttype, "operator class") == 0 ||
+ strcmp(obj->objecttype, "operator family") == 0 ||
+ strcmp(obj->objecttype, "cast") == 0 ||
+ strcmp(obj->objecttype, "type") == 0 ||
+ strcmp(obj->objecttype, "domain") == 0 ||
+ strcmp(obj->objecttype, "trigger") == 0 ||
+ strcmp(obj->objecttype, "conversion") == 0 ||
+ strcmp(obj->objecttype, "policy") == 0 ||
+ strcmp(obj->objecttype, "rule") == 0 ||
+ strcmp(obj->objecttype, "extension") == 0 ||
+ strcmp(obj->objecttype, "foreign-data wrapper") == 0 ||
+ strcmp(obj->objecttype, "text search configuration") == 0 ||
+ strcmp(obj->objecttype, "text search dictionary") == 0 ||
+ strcmp(obj->objecttype, "text search parser") == 0 ||
+ strcmp(obj->objecttype, "text search template") == 0 ||
+ strcmp(obj->objecttype, "transform") == 0 ||
+ strcmp(obj->objecttype, "server") == 0 ||
+ strcmp(obj->objecttype, "collation") == 0 ||
+ strcmp(obj->objecttype, "user mapping") == 0 ||
+ strcmp(obj->objecttype, "language") == 0 ||
+ strcmp(obj->objecttype, "view") == 0 ||
+ strcmp(obj->objecttype, "materialized view") == 0 ||
+ strcmp(obj->objecttype, "statistics object") == 0 ||
+ strcmp(obj->objecttype, "access method") == 0)
+ cmdtype = DCT_ObjectDrop;
+ else
+ continue;
+
+ /* Change foreign-data wrapper to foreign data wrapper */
+ if (strncmp(obj->objecttype, "foreign-data wrapper", 20) == 0)
+ {
+ tmptype = pstrdup("foreign data wrapper");
+ command = deparse_drop_command(obj->objidentity, tmptype,
+ stmt->behavior);
+ }
+
+ /* Change statistics object to statistics */
+ else if (strncmp(obj->objecttype, "statistics object",
+ strlen("statistics object")) == 0)
+ {
+ tmptype = pstrdup("statistics");
+ command = deparse_drop_command(obj->objidentity, tmptype,
+ stmt->behavior);
+ }
+
+ /*
+ * object identity needs to be modified to make the drop work
+ *
+ * FROM: <role> on server <servername> TO : for >role> server
+ * <servername>
+ *
+ */
+ else if (strncmp(obj->objecttype, "user mapping", 12) == 0)
+ {
+ char *on_server;
+
+ tmptype = palloc(strlen(obj->objidentity) + 2);
+ on_server = strstr(obj->objidentity, "on server");
+
+ sprintf((char *) tmptype, "for ");
+ strncat((char *) tmptype, obj->objidentity, on_server - obj->objidentity);
+ strcat((char *) tmptype, on_server + 3);
+ command = deparse_drop_command(tmptype, obj->objecttype,
+ stmt->behavior);
+ }
+ else
+ command = deparse_drop_command(obj->objidentity, obj->objecttype,
+ stmt->behavior);

14a.
Why are some of these implemented as strcmp and others are implemented
as strncmp?

14b.
The mass strcmp seems inefficient. The same could be done in other ways like:
- use a single strstr call (where all the possibilities are in one large string)
- pass string representation of some enum and just switch on it
- etc.

15.

+ /*
+ * object identity needs to be modified to make the drop work
+ *
+ * FROM: <role> on server <servername> TO : for >role> server
+ * <servername>
+ *
+ */

The comment needs fixing.

16.

+ if (command == NULL)
+ continue;
+
+ LogLogicalDDLMessage("deparse", obj->address.objectId, cmdtype,
+ command, strlen(command) + 1);

SUGGESTION

if (command)
LogLogicalDDLMessage("deparse", obj->address.objectId, cmdtype,
command, strlen(command) + 1);

======

src/backend/commands/publicationcmds.c

17. CreateDDLReplicaEventTrigger

+ static const char *trigger_name_prefix = "pg_deparse_trig_%s_%u";
+ static const char *trigger_func_prefix = "publication_deparse_%s";

17a.
I felt the ddl deparse trigger name should have the name "ddl" in it somewhere

17b.
Why are these called "prefixes" ?? - They looked more just like name
format strings to me.

~~~

18. CreatePublication

+ /*
+ * Create an event trigger to allow logging of DDL statements.
+ *
+ * TODO: We need to find a better syntax to allow replication of DDL
+ * statements.
+ *
+ * XXX: This code is just to show the replication of CREATE/ALTER/DROP
+ * TABLE works. We need to enhance this once the approach for DDL
+ * replication is finalized.
+ */
+ if (pubactions.pubddl)

This comment needs updating.

19.

+ CommandTag end_commands[] = {
+ CMDTAG_CREATE_ACCESS_METHOD,
+ CMDTAG_DROP_ACCESS_METHOD,
+ CMDTAG_ALTER_DEFAULT_PRIVILEGES,
+ CMDTAG_COMMENT,
+ CMDTAG_CREATE_LANGUAGE,
+ CMDTAG_ALTER_LANGUAGE,
+ CMDTAG_DROP_LANGUAGE,
+ CMDTAG_CREATE_VIEW,
+ CMDTAG_ALTER_VIEW,
+ CMDTAG_DROP_VIEW,
+ CMDTAG_CREATE_MATERIALIZED_VIEW,

19a.
Some better ordering (e.g. A-Z) can be done here, and maybe use blank
lines to make the groupings more obbious.

19b.
Wouldn't it be better to declare these static?

======

src/backend/replication/logical/Makefile

20.

OBJS = \
decode.o \
+ ddlmessage.o\
launcher.o \
Change should be in alphabetical order.

======

src/backend/replication/logical/ddlmessage.c

21. File Comment

+ * Unlike generic logical messages, these DDL messages have only transactional
+ * mode.Note by default DDLs in PostgreSQL are transactional.

Missing space before "Note"

~~~

22. LogLogicalDDLMessage

+ /*
+ * Ensure we have a valid transaction id.
+ */
+ Assert(IsTransactionState());
+ GetCurrentTransactionId();

Single line comment should be OK here

23.

+ /* trailing zero is critical; see logicalddlmsg_desc */

Uppercase comment

24.

+ /* allow origin filtering */

Uppercase comment

======

src/backend/replication/logical/proto.c

25. logicalrep_read_ddlmessage

+ uint8 flags;
+ char *msg;
+
+ //TODO double check when do we need to get TransactionId.
+
+ flags = pq_getmsgint(in, 1);
+ if (flags != 0)
+ elog(ERROR, "unrecognized flags %u in ddl message", flags);
+ *lsn = pq_getmsgint64(in);
+ *prefix = pq_getmsgstring(in);
+ *sz = pq_getmsgint(in, 4);
+ msg = (char *) pq_getmsgbytes(in, *sz);
+
+ return msg;

25a.
This code will fail if the associated *write* function has sent a xid.
Maybe additional param is needed to tell it when to read the xid?

25b.
Will be tidier to have a blank line after the elog

~~~

26. logicalrep_write_ddlmessage

+ /* transaction ID (if not valid, we're not streaming) */
+ if (TransactionIdIsValid(xid))
+ pq_sendint32(out, xid);

Perhaps this "write" function should *always* write the xid even if it
is invalid because then the "read" function will know to always read
it.

======

src/backend/replication/logical/reorderbuffer.c

27. ReorderBufferQueueDDLMessage

+ Assert(xid != InvalidTransactionId);

SUGGESTION
Assert(TransactionIdIsValid(xid));

~~~

28. ReorderBufferSerializeChange

+ data += sizeof(int);
+ memcpy(data, change->data.ddlmsg.prefix,
+ prefix_size);
+ data += prefix_size;

Unnecessary wrapping of memcpy.

29.

+ memcpy(data, &change->data.ddlmsg.cmdtype, sizeof(int));
+ data += sizeof(int);

Would that be better to write as:

sizeof(DeparsedCommandType) instead of sizeof(int)

~~~

30. ReorderBufferChangeSize

+ case REORDER_BUFFER_CHANGE_DDLMESSAGE:
+ {
+ Size prefix_size = strlen(change->data.ddlmsg.prefix) + 1;
+
+ sz += prefix_size + change->data.ddlmsg.message_size +
+ sizeof(Size) + sizeof(Size) + sizeof(Oid) + sizeof(int);

sizeof(DeparsedCommandType) instead of sizeof(int)

~~~

31. ReorderBufferRestoreChange

+ case REORDER_BUFFER_CHANGE_DDLMESSAGE:
+ {
+ Size prefix_size;
+
+ /* read prefix */
+ memcpy(&prefix_size, data, sizeof(Size));
+ data += sizeof(Size);
+ memcpy(&change->data.ddlmsg.relid, data, sizeof(Oid));
+ data += sizeof(Oid);
+ memcpy(&change->data.ddlmsg.cmdtype, data, sizeof(int));
+ data += sizeof(int);
+ change->data.ddlmsg.prefix = MemoryContextAlloc(rb->context, prefix_size);
+ memcpy(change->data.ddlmsg.prefix, data, prefix_size);
+ Assert(change->data.ddlmsg.prefix[prefix_size - 1] == '\0');
+ data += prefix_size;
+
+ /* read the message */
+ memcpy(&change->data.msg.message_size, data, sizeof(Size));
+ data += sizeof(Size);
+ change->data.msg.message = MemoryContextAlloc(rb->context,
+ change->data.msg.message_size);
+ memcpy(change->data.msg.message, data,
+ change->data.msg.message_size);
+ data += change->data.msg.message_size;

31a.
sizeof(DeparsedCommandType) better instead of sizeof(int)?

31b.
Uppercase the comments

======

src/backend/replication/logical/worker.c

32. preprocess_create_table

+/* Remove the data population from the command */
+static void
+preprocess_create_table(RawStmt *command)

The comment is too short. Needs more explanation than this.

~~~

33. handle_create_table

+/*
+ * Handle CREATE TABLE command
+ *
+ * Call AddSubscriptionRelState for CREATE TABEL command to set the relstate to
+ * SUBREL_STATE_READY so DML changes on this new table can be
replicated without
+ * having to manually run "alter subscription ... refresh publication"
+ */

Typo "TABEL"

~~~

34. handle_create_table

+ switch (commandTag)
+ {
+ case CMDTAG_CREATE_TABLE:
+ {
+ CreateStmt *cstmt = (CreateStmt *) command->stmt;
+
+ rv = cstmt->relation;
+ }
+ break;
+ default:
+ break;
+ }
+
+ if (!rv)
+ return;

This switch seems overcomplicated since the function only cares about
CMDTAG_CREATE_TABLE.

SUGGESTION

if (commandTag == CMDTAG_CREATE_TABLE)
{
CreateStmt *cstmt = (CreateStmt *) command->stmt;
rv = cstmt->relation;
}
else
{
return;
}

35.

+ if (relnamespace != InvalidOid)
+ relid = get_relname_relid(relname, relnamespace);
+ else
+ relid = RelnameGetRelid(relname);
+
+ if (relid != InvalidOid)
+ {

35a.
Maybe better to use the OidIsValid() macro for these places

35b.
I'm not 100% sure of this logic. Is it even *possible* for these to be
InvalidOid -- e.g. I thought the CREATE TABLE would have failed
already if this was the case. Maybe these checks can be changed to
Asserts?

~~~

36. apply_handle_ddl

+
+static void
+apply_handle_ddl(StringInfo s)

Missing function comment

======

src/backend/replication/pgoutput/pgoutput.c

37. pgoutput_change

@@ -1377,9 +1386,22 @@ pgoutput_change(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
ReorderBufferChangeType action = change->action;
TupleTableSlot *old_slot = NULL;
TupleTableSlot *new_slot = NULL;
+ bool table_rewrite = false;

update_replication_progress(ctx, false);

+ /*
+ * For heap rewrites, we might need to replicate them if the rewritten
+ * table publishes rewrite ddl message. So get the actual relation here
+ * and check the pubaction later.
+ */
+ if (relation->rd_rel->relrewrite)
+ {
+ table_rewrite = true;
+ relation = RelationIdGetRelation(relation->rd_rel->relrewrite);
+ targetrel = relation;
+ }
+
if (!is_publishable_relation(relation))
return;

@@ -1413,6 +1435,13 @@ pgoutput_change(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
Assert(false);
}

+ /*
+ * We don't publish table rewrite change unless we publish the rewrite ddl
+ * message.
+ */
+ if (table_rewrite && !relentry->pubactions.pubddl)
+ return;
+

Something does not seem right. Other code later in this function takes
care to call RelationClose(relation), but in the above change, the
logic is just returning without closing anything.

~~~

38. pgoutput_message

@@ -1671,8 +1714,8 @@ pgoutput_truncate(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,

static void
pgoutput_message(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
- XLogRecPtr message_lsn, bool transactional, const char *prefix, Size sz,
- const char *message)
+ XLogRecPtr message_lsn, bool transactional,
+ const char *prefix, Size sz, const char *message)
{

This change of wrapping seems unrelated , so should not be done in this patch.

~~~

39. pgoutput_ddlmessage

+static void
+pgoutput_ddlmessage(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr message_lsn,
+ const char *prefix, Oid relid, DeparsedCommandType cmdtype,
+ Size sz, const char *message)

Missing function comment.

40.

+ switch (cmdtype)

40a.
Might be tidier to have a consistent space *before* each case of this switch.

40b.
I felt it was too confusing having some of the switch case break and
some of the switch cases return from the function -- e.g It seems
difficult to know what conditions will execute the code that follows
the switch. Maybe all this needs to be refactored somehow, or just
commented on more.

======

src/bin/pg_dump/pg_dump.c

41. getPublications

- if (fout->remoteVersion >= 130000)
+ if (fout->remoteVersion >= 150000)

Should be >= 160000, right?

42.

else if (fout->remoteVersion >= 110000)
appendPQExpBufferStr(query,
"SELECT p.tableoid, p.oid, p.pubname, "
"p.pubowner, "
- "p.puballtables, p.pubinsert, p.pubupdate, p.pubdelete,
p.pubtruncate, false AS pubviaroot "
+ "p.puballtables, p.pubinsert, p.pubupdate, p.pubdelete,
p.pubtruncate, false as p.pubddl, false AS pubviaroot "
"FROM pg_publication p");
else
appendPQExpBufferStr(query,
"SELECT p.tableoid, p.oid, p.pubname, "
"p.pubowner, "
- "p.puballtables, p.pubinsert, p.pubupdate, p.pubdelete, false AS
pubtruncate, false AS pubviaroot "
+ "p.puballtables, p.pubinsert, p.pubupdate, p.pubdelete, false AS
pubtruncate, false as p.pubddl, false AS pubviaroot "
"FROM pg_publication p");

Use uppercase 'AS' for consistency with other code.

======

src/bin/pg_dump/pg_dump.h

43. PublicationInfo

@@ -620,6 +620,7 @@ typedef struct _PublicationInfo
bool pubdelete;
bool pubtruncate;
bool pubviaroot;
+ bool pubddl;
} PublicationInfo;

IMO the new member should be adjacent to the other 'publish' parameter
values like pubdelete/pubtruncate.

======

src/bin/psql/describe.c

44. listPublications

+ if (pset.sversion >= 140000)
+ appendPQExpBuffer(&buf,
+ ",\n pubddl AS \"%s\"",
+ gettext_noop("DDLs"));

44a.
Should that be 160000?

44b.
IMO it would be better if "DLLs" column appeared adjacent to that the
other 'publish' parameter option values. (e.g. these are not even the
same column ordering as pg_dump).

~~~

45. describePublications

has_pubtruncate = (pset.sversion >= 110000);
has_pubviaroot = (pset.sversion >= 130000);
+ has_pubddl = (pset.sversion >= 150000);

Shouldn't that be 160000?

46.

@@ -6313,6 +6319,9 @@ describePublications(const char *pattern)
if (has_pubviaroot)
appendPQExpBufferStr(&buf,
", pubviaroot");
+ if (has_pubddl)
+ appendPQExpBufferStr(&buf,
+ ", pubddl");

IMO it would be better if "DLLs" column appeared adjacent to that the
other 'publish' parameter option values. (e.g. these are not even the
same column ordering as pg_dump).

======

src/include/catalog/pg_proc.dat

47.

+{ oid => '4644', descr => 'trigger for ddl command deparse',
+ proname => 'publication_deparse_ddl_command_end', prorettype =>
'event_trigger',
+ proargtypes => '', prosrc => 'publication_deparse_ddl_command_end' },

Why doesn't the description say 'end'?

======

src/include/catalog/pg_publication.h

48. FormData_pg_publication

+
+ /* true if table creations are published */
+ bool pubddl;
} FormData_pg_publication;

Why just table publications? I thought it was for EVERYTHING.

~~~

49. PublicationActions

+ bool pubddl;
} PublicationActions;

This might be OK for POC, but for the real feature, I think this
should be more fine-grained than this all-or-nothing DDL.

======

src/include/replication/ddlmessage.h

50.

+{
+ Oid dbId; /* database Oid emitted from */
+ Size prefix_size; /* length of prefix */
+ Oid relid; /* id of the table */
+ DeparsedCommandType cmdtype; /* type of sql command */
+ Size message_size; /* size of the message */
+
+ /*
+ * payload, including null-terminated prefix of length prefix_size
+ */
+ char message[FLEXIBLE_ARRAY_MEMBER];
+} xl_logical_ddl_message;

50a.
The prefix_size comment needs to say /* length of the prefix
(including '\0' terminator) */

50b.
'relid' seems specific to TABLE DDL. Will future versions have many
more Oid members here? Or should this be a union member or a generic
name like 'objid'?

~~~

51. XLOG_LOGICAL_DDL_MESSAGE

+/* RMGR API*/
+#define XLOG_LOGICAL_DDL_MESSAGE 0x00

0x00 is same value as XLOG_LOGICAL_MESSAGE in message.h. That doesn't
seem correct because then how will those different messages be
identified?

======

src/include/replication/logicalproto.h

52. LogicalRepMsgType

@@ -61,6 +61,7 @@ typedef enum LogicalRepMsgType
LOGICAL_REP_MSG_RELATION = 'R',
LOGICAL_REP_MSG_TYPE = 'Y',
LOGICAL_REP_MSG_MESSAGE = 'M',
+ LOGICAL_REP_MSG_DDLMESSAGE = 'L',
LOGICAL_REP_MSG_BEGIN_PREPARE = 'b',

The name already includes _MSG_ so why say MESSAGE again? IMO this
should be called just LOGICAL_REP_MSG_DDL. See general comment.

~~~

53.

extern void logicalrep_write_message(StringInfo out, TransactionId
xid, XLogRecPtr lsn,
- bool transactional, const char *prefix, Size sz, const char *message);
+ bool transactional, const char *prefix,
+ Size sz, const char *message);

Modifying the wrapping of this unrelated function should not be done
in this patch.

======

src/include/replication/reorderbuffer.h

54. REORDER_BUFFER_CHANGE_DDLMESSAGE

@@ -56,6 +58,7 @@ typedef enum ReorderBufferChangeType
REORDER_BUFFER_CHANGE_INSERT,
REORDER_BUFFER_CHANGE_UPDATE,
REORDER_BUFFER_CHANGE_DELETE,
+ REORDER_BUFFER_CHANGE_DDLMESSAGE,

Why not call it REORDER_BUFFER_CHANGE_DDL? -- see general review comment

~~~

55. ReorderBufferChange

+ /* DDL Message. */
+ struct
+ {
+ char *prefix;
+ Size message_size;
+ char *message;
+ Oid relid;
+ DeparsedCommandType cmdtype;
+ } ddlmsg;
+

Why not call it ddl? -- see general review comment

======

src/test/regress/expected/psql.out

56.

I wondered if "DDLs" belongs adjacent to the
Inserts/Updates/Deletes/Trucates because those are the other "publish"
parameters just like this.

======

src/test/regress/expected/publication.out

57.

(Ditto comment for psql.out)

I wondered if "DDLs" belongs adjacent to the
Inserts/Updates/Deletes/Trucates because those are the other "publish"
parameters just like this.

~~~

58.

Looks like there is a missing regress test case where you actually set
the publish='ddl' and then verify that the DDLs column is correctly
set 't'?

======

59. MISC = typedefs.list

There are missing some typedefs.list changes for this patch. At least
the following:

e.g.
- DeparsedCommandType (from ddlmessage.h)
- xl_logical_ddl_message (from ddlmessage.h)
- LogicalDecodeDDLMessageCB (from output_plugin.h)
- LogicalDecodeStreamDDLMessageCB (from output_plugin.h)
- ReorderBufferDDLMessageCB (from reorderbuffer.h)
- ReorderBufferStreamDDLMessageCB (from reorderbuffer.h)

------

Kind Regards,
Peter Smith.
Fujitsu Australia

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-11-16 08:57:14
Message-ID:	CALDaNm1mZ58QRZad-FMoq4jSLC=2kBihBx-hjW_10VdKx0Twbw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, 11 Nov 2022 at 10:48, Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> On Fri, Nov 11, 2022 at 4:09 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
> > On Fri, Nov 11, 2022 at 3:47 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > >
> > > Here are more review comments for the v32-0001 file ddl_deparse.c
> > >
> > > *** NOTE - my review post became too big, so I split it into smaller parts.
> >
>
> THIS IS PART 3 OF 4.
>
> =======
>
> src/backend/commands/ddl_deparse.c
>
> 50. deparse_DefineStmt_Operator
>
> +/*
> + * Deparse a DefineStmt (CREATE OPERATOR)
> + *
> + * Given a trigger OID and the parse tree that created it, return an ObjTree
> + * representing the creation command.
> + */
> +static ObjTree *
> +deparse_DefineStmt_Operator(Oid objectId, DefineStmt *define)
>
> "trigger OID" ?? Is that right?

Modified

>
> 51.
>
> /* MERGES */
> tmpobj = new_objtree_VA("MERGES", 1,
> "clause", ObjTypeString, "merges");
> if (!oprForm->oprcanmerge)
> append_bool_object(tmpobj, "present", false);
> list = lappend(list, new_object_object(tmpobj));
>
> /* HASHES */
> tmpobj = new_objtree_VA("HASHES", 1,
> "clause", ObjTypeString, "hashes");
> if (!oprForm->oprcanhash)
> append_bool_object(tmpobj, "present", false);
> list = lappend(list, new_object_object(tmpobj));
>
>
> Maybe HASHES and MERGES should be done in a different order, just to
> be consistent with the PG documentation [2].

Modified

> 52. deparse_DefineStmt_Type
>
> + /* Shortcut processing for shell types. */
> + if (!typForm->typisdefined)
> + {
> + stmt = new_objtree_VA("CREATE TYPE", 0);
> + append_object_object(stmt, "%{identity}D",
> + new_objtree_for_qualname(typForm->typnamespace,
> + NameStr(typForm->typname)));
> + append_bool_object(stmt, "present", true);
> + ReleaseSysCache(typTup);
> + return stmt;
> + }
> +
> + stmt = new_objtree_VA("CREATE TYPE", 0);
> + append_object_object(stmt, "%{identity}D",
> + new_objtree_for_qualname(typForm->typnamespace,
> + NameStr(typForm->typname)));
> + append_bool_object(stmt, "present", true);
>
> 52a.
> This code looked strange because everything is the same except the
> Release/return, so IMO it should be refactored to use the common code.

Modified to remove duplicate code

> 52b.
> The VA(0 args) should be combined with the subsequent appends to use
> fewer append_XXX calls.

Modified

> 53.
> Is it necessary to say append_bool_object(stmt, "present", true); ? --
> I'd assumed that is the default unless it explicitly says false.

Removed it.

> 54.
>
> /* INPUT */
> tmp = new_objtree_VA("(INPUT=", 1,
> "clause", ObjTypeString, "input");
> append_object_object(tmp, "%{procedure}D",
> new_objtree_for_qualname_id(ProcedureRelationId,
> typForm->typinput));
> list = lappend(list, new_object_object(tmp));
>
> /* OUTPUT */
> tmp = new_objtree_VA("OUTPUT=", 1,
> "clause", ObjTypeString, "output");
> append_object_object(tmp, "%{procedure}D",
> new_objtree_for_qualname_id(ProcedureRelationId,
> typForm->typoutput));
> list = lappend(list, new_object_object(tmp));
>
> These could each be simplified into single VA() function calls, the
> same as was done in deparse_DefineStmt_Operator PROCEDURE.
>
> And the same comment applies to other parts. e.g.:
> - /* CATEGORY */
> - /* ALIGNMENT */
> - STORAGE

Modified

> 55.
>
> + tmp = new_objtree_VA("STORAGE=", 1,
> + "clause", ObjTypeString, "storage");
>
> Missing comment above this to say /* STORAGE */

Modified

> 56.
>
> + /* INTERNALLENGTH */
> + if (typForm->typlen == -1)
> + {
> + tmp = new_objtree_VA("INTERNALLENGTH=VARIABLE", 0);
> + }
> + else
> + {
> + tmp = new_objtree_VA("INTERNALLENGTH=%{typlen}n", 1,
> + "typlen", ObjTypeInteger, typForm->typlen);
> + }
>
> 56a.
> The VA(args = 0) does not need to be a VA function.

This is already changed as part of another comment fix

> 56b.
> The { } blocks are unnecessary

Modified

> 57. deparse_DefineStmt_TSConfig
>
> +
> +static ObjTree *
> +deparse_DefineStmt_TSConfig(Oid objectId, DefineStmt *define,
> + ObjectAddress copied)
>
> Missing function comment.

Modified

> 58.
>
> + stmt = new_objtree("CREATE");
> +
> + append_object_object(stmt, "TEXT SEARCH CONFIGURATION %{identity}D",
> + new_objtree_for_qualname(tscForm->cfgnamespace,
> + NameStr(tscForm->cfgname)));
>
> Why not combine these using VA() function?

Modified

> 59.
>
> + list = NIL;
> + /* COPY */
>
> Just assign NIL when declared.

Modified

> 60.
>
> + if (copied.objectId != InvalidOid)
>
> Use OidIsValid macro.

Modified

> 61. deparse_DefineStmt_TSParser
>
> +
> +static ObjTree *
> +deparse_DefineStmt_TSParser(Oid objectId, DefineStmt *define)
>
> Missing function comment.

Modified

> 62.
>
> + stmt = new_objtree("CREATE");
> +
> + append_object_object(stmt, "TEXT SEARCH PARSER %{identity}D",
> + new_objtree_for_qualname(tspForm->prsnamespace,
> + NameStr(tspForm->prsname)));
>
> Why not combine as a single VA() function call?

Modified

> 63.
>
> + list = NIL;
>
> Just assign NIL when declared

Modified

> 64.
>
> tmp = new_objtree_VA("START=", 1,
> "clause", ObjTypeString, "start");
> append_object_object(tmp, "%{procedure}D",
> new_objtree_for_qualname_id(ProcedureRelationId,
> tspForm->prsstart));
>
>
> Easily combined to be a single VA() function call.
>
> The same comment applies for
> - /* GETTOKEN */
> - /* END */
> - /* LEXTYPES */

Modified

> 65. deparse_DefineStmt_TSDictionary
>
> +static ObjTree *
> +deparse_DefineStmt_TSDictionary(Oid objectId, DefineStmt *define)
>
> Missing function comment.

Modified

> 66.
>
> + stmt = new_objtree("CREATE");
> +
> + append_object_object(stmt, "TEXT SEARCH DICTIONARY %{identity}D",
> + new_objtree_for_qualname(tsdForm->dictnamespace,
> + NameStr(tsdForm->dictname)));
>
> Why not combine this as a single VA() function call?

Modified

> 67.
>
> + list = NIL;
>
> Just assign NIL when declared

Modified

> 68.
>
> + tmp = new_objtree_VA("", 0);
>
> Don't need VA() function for 0 args.

Modified

> 69. deparse_DefineStmt_TSTemplate
>
> +static ObjTree *
> +deparse_DefineStmt_TSTemplate(Oid objectId, DefineStmt *define)
>
> Missing function comment.

Modified

> 70.
>
> + stmt = new_objtree("CREATE");
> +
> + append_object_object(stmt, "TEXT SEARCH TEMPLATE %{identity}D",
> + new_objtree_for_qualname(tstForm->tmplnamespace,
> + NameStr(tstForm->tmplname)));
>
> Combine this to be a single VA() function call.

Modified

> 71.
>
> + list = NIL;
>
> Just assign NIL when declared

Modified

> 72.
>
> + tmp = new_objtree_VA("LEXIZE=", 1,
> + "clause", ObjTypeString, "lexize");
> + append_object_object(tmp, "%{procedure}D",
> + new_objtree_for_qualname_id(ProcedureRelationId,
> + tstForm->tmpllexize));
>
> Combine this to be a single VA() function call.

Modified

> 73. deparse_AlterTSConfigurationStmt
>
> +static ObjTree *
> +deparse_AlterTSConfigurationStmt(CollectedCommand *cmd)
>
> Missing function comment.

Modified

> 74.
>
> + /* determine the format string appropriate to each subcommand */
> + switch (node->kind)
>
> Uppercase comment

Modified

> 75.
>
> + tmp = new_objtree_VA("IF EXISTS", 0);
>
> Should not use a VA() function with 0 args.

Modified

> 76.
>
> + case ALTER_TSCONFIG_ALTER_MAPPING_FOR_TOKEN:
> + append_object_object(config, "%{identity}D ALTER MAPPING",
> + new_objtree_for_qualname_id(cmd->d.atscfg.address.classId,
> + cmd->d.atscfg.address.objectId));
> + break;
> +
> + case ALTER_TSCONFIG_REPLACE_DICT:
> + append_object_object(config, "%{identity}D ALTER MAPPING",
> + new_objtree_for_qualname_id(cmd->d.atscfg.address.classId,
> + cmd->d.atscfg.address.objectId));
> + break;
> +
> + case ALTER_TSCONFIG_REPLACE_DICT_FOR_TOKEN:
> + append_object_object(config, "%{identity}D ALTER MAPPING",
> + new_objtree_for_qualname_id(cmd->d.atscfg.address.classId,
> + cmd->d.atscfg.address.objectId));
> + break;
>
> If all these 3 cases have identical code then why repeat it three times?

Modified

> 77.
>
> + /* add further subcommand-specific elements */
>
> Uppercase comment

Modified

> 78.
>
> + /* the REPLACE forms want old and new dictionaries */
> + Assert(cmd->d.atscfg.ndicts == 2);
>
> Uppercase comment.
>
> ------
>
> 79. deparse_AlterTSDictionaryStmt
>
> +
> +static ObjTree *
> +deparse_AlterTSDictionaryStmt(Oid objectId, Node *parsetree)
>
> Missing function comment

Modified

> 80.
>
> + alterTSD = new_objtree("ALTER TEXT SEARCH DICTIONARY");
> +
> + append_object_object(alterTSD, "%{identity}D",
> + new_objtree_for_qualname(tsdForm->dictnamespace,
> + NameStr(tsdForm->dictname)));
>
> Combine this as a sing VA() function call

Modified

> 81.
>
> + tmp = new_objtree_VA("", 0);
>
> Don't use the VA() function for 0 args.

Modified

> 82. deparse_RelSetOptions
>
> + if (is_reset)
> + fmt = "RESET ";
> + else
> + fmt = "SET ";
> +
> + relset = new_objtree(fmt);
>
> 82a.
> Those format trailing spaces are a bit unusual. The append_XXX will
> take care of space separators anyhow so it is not needed like this.

This code will get removed because of your next comment

> 82b.
> This can all be simplified to one line:
>
> relset = new_objtree(is_reset ? "RESET" : "SET");

Modified

> 83. deparse_ViewStmt
>
> + * Given a view OID and the parsetree that created it, return an ObjTree
> + * representing the creation command.
> + */
>
> Be consistent with other function headers:
>
> "parsetree" -> "parse tree".

Modified

> 84.
>
> + viewStmt = new_objtree("CREATE");
> +
> + append_string_object(viewStmt, "%{or_replace}s",
> + node->replace ? "OR REPLACE" : "");
> +
> + append_string_object(viewStmt, "%{persistence}s",
> + get_persistence_str(relation->rd_rel->relpersistence));
> +
> + tmp = new_objtree_for_qualname(relation->rd_rel->relnamespace,
> + RelationGetRelationName(relation));
> +
> + append_object_object(viewStmt, "VIEW %{identity}D", tmp);
> +
> + append_string_object(viewStmt, "AS %{query}s",
> + pg_get_viewdef_internal(objectId));
>
> IMO all of this can be combined in a single VA() function call.

Modified

> 85. deparse_CreateTableAsStmt_vanilla
>
> +/*
> + * Deparse CREATE Materialized View statement, it is a variant of
> CreateTableAsStmt
> + *
> + * Note that CREATE TABLE AS SELECT INTO can also be deparsed by
> + * deparse_CreateTableAsStmt to remove the SELECT INTO clause.
> + */
> +static ObjTree *
> +deparse_CreateTableAsStmt_vanilla(Oid objectId, Node *parsetree)
>
> The function comment refers to 'deparse_CreateTableAsStmt' but I don't
> see any such function. Maybe this was renamed causing the comment
> became stale?

deparse_CreateTableAsStmt is present in ddl_deparse.c file, it is
required to handle SCT_CreateTableAs case.

> 86.
>
> + /* Add identity */
> + append_object_object(createStmt, "%{identity}D",
> + new_objtree_for_qualname_id(RelationRelationId,
> + objectId));
>
> This could be included as another arg of the preceding VA() call/

Modified

> 87.
>
> + /* COLLUMNS clause */
> + if (node->into->colNames == NIL)
> + tmp = new_objtree_VA("", 1,
> + "present", ObjTypeBool, false);
> + else
>
> 87a.
> Typo "COLLUMNS"

Modified

> 87b.
> It might be more usual/natural to reverse this if/else to check the
> list is NOT empty. e.g.
>
> if (node->into->colNames)
> ...
> else
> tmp = new_objtree_VA("", 1,
> "present", ObjTypeBool, false);

Modified

> 88.
>
> + tmp = new_objtree("USING");
> + if (node->into->accessMethod)
> + append_string_object(tmp, "%{access_method}I", node->into->accessMethod);
> + else
> + {
> + append_null_object(tmp, "%{access_method}I");
> + append_bool_object(tmp, "present", false);
> + }
>
> I'm not sure why a null object is necessary when present = false.

This code was intended to generate a verbose json node for "USING
accessmethod". So that user can easily modify the command by changing
the value of access_method to generate a new ddl command with access
method specified. I'm retaining this for now.

> 89.
>
> + /* WITH clause */
> + tmp = new_objtree_VA("WITH", 0);
>
> VA() function call is not needed when there are 0 args.

This is already fixed.

> 90.
>
> + /* TABLESPACE clause */
> + tmp = new_objtree_VA("TABLESPACE %{tablespace}I", 0);
>
> VA() function call nor needed when there are 0 args.

This is already fixed

> 91.
>
> + else
> + {
> + append_null_object(tmp, "%{tablespace}I");
> + append_bool_object(tmp, "present", false);
> + }
>
> I'm not sure why a null object is necessary when present = false.

This code was intended to generate a verbose json node for "USING
tablespace". So that user can easily modify the command by changing
the value of tablespace to generate a new ddl command with tablespace
specified.

> 92.
>
> + /* add a WITH NO DATA clause */
> + tmp = new_objtree_VA("WITH NO DATA", 1,
> + "present", ObjTypeBool,
> + node->into->skipData ? true : false);
> + append_object_object(createStmt, "%{with_no_data}s", tmp);
>
> 92a.
> Uppercase comment.

Modified

> 92b.
> It is a bit confusing that this style of specifying empty tree (just
> saying present/not present) is used here. But elsewhere in this patch
> for similar syntax it just adds text or an empty string.
> e.g.
> + append_string_object(renameStmt, "%{if_exists}s",
> + node->missing_ok ? "IF EXISTS" : "");
>
> IMO it's better to apply a consistent deparse approach for everything.
> But without documentation of the deparse structure, it is kind of
> impossible to know even what *are* the rules?

I have not handled this comment, I felt we can take this comment
separately along with the documentation of deparsing.

> 93. deparse_CreateTrigStmt
>
> + trigger = new_objtree("CREATE");
> +
> + append_string_object(trigger, "%{constraint}s",
> + node->isconstraint ? "CONSTRAINT" : "");
> +
> + append_string_object(trigger, "TRIGGER %{name}I", node->trigname);
>
> All this can be combined into a single VA() call.

Modified

> 94.
>
> + if (node->timing == TRIGGER_TYPE_BEFORE)
> + append_string_object(trigger, "%{time}s", "BEFORE");
> + else if (node->timing == TRIGGER_TYPE_AFTER)
> + append_string_object(trigger, "%{time}s", "AFTER");
> + else if (node->timing == TRIGGER_TYPE_INSTEAD)
> + append_string_object(trigger, "%{time}s", "INSTEAD OF");
> + else
> + elog(ERROR, "unrecognized trigger timing type %d", node->timing);
>
> It might be better to assign the value to a char* and then just have
> only a single append_string_object() call.
>
> char *tval =
> node->timing == TRIGGER_TYPE_BEFORE ? "BEFORE" :
> node->timing == TRIGGER_TYPE_AFTER ? "AFTER" :
> node->timing == TRIGGER_TYPE_INSTEAD ? "INSTEAD OF" :
> NULL;
> if (tval == NULL)
> elog(ERROR, "unrecognized trigger timing type %d", node->timing);
> append_string_object(trigger, "%{time}s", tval);

Modified

> 95.
>
> + tmpobj = new_objtree_VA("FROM", 0);
>
> VA() function call is not needed for 0 args.

This is fixed already as part of another comment fix

> 96.
>
> + tmpobj = new_objtree_VA("WHEN", 0);
>
> VA() function call is not needed for 0 args.

This is fixed already as part of another comment fix

> 97.
>
> Should use consistent wording for unexpected nulls.
>
> e.g.1
> + if (isnull)
> + elog(ERROR, "bogus NULL tgqual");
>
> e.g.2
> + if (isnull)
> + elog(ERROR, "invalid NULL tgargs");

Modified to keep it consistent

> 98.
>
> + append_format_string(tmpobj, "(");
> + append_array_object(tmpobj, "%{args:, }L", list); /* might be NIL */
> + append_format_string(tmpobj, ")");
>
> IMO probably that can be a single call to append_array_object which
> includes the enclosing parens.

We cannot change it to a single call because in some cases there is a
possibility that the list can be NULL, if list is empty then
append_array_object will return without appending "(". For empty list,
we should append it with (). I'm not making any change for this.

Thanks for the comments, the attached v37 patch has the changes for the same.

Regards,
Vignesh

Attachment	Content-Type	Size
v37-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	text/x-patch	15.0 KB
v37-0004-Test-cases-for-DDL-replication.patch	text/x-patch	24.6 KB
v37-0002-Support-DDL-replication.patch	text/x-patch	133.5 KB
v37-0001-Functions-to-deparse-DDL-commands.patch	text/x-patch	318.4 KB

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-11-17 19:01:50
Message-ID:	CAAD30UJ2MmA7vM1H2b20L_SMHS0-76raROqZELs-GDGk3Pet5A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, Oct 28, 2022 at 2:50 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Here are some review comments for patch v32-0001.
>
> This is a WIP - I have not yet looked at the largest file of this
> patch (src/backend/commands/ddl_deparse.c)
>
> ======
>
> Commit Message
>
> 1.
>
> The list of the supported statements should be in alphabetical order
> to make it easier to read
Updated the list in the commit message.

> 2.
>
> The "Notes" are obviously notes, so the text does not need to say
> "Note that..." etc again
>
> "(Note #1) Note that some..." -> "(Note #1) Some..."
>
> "(Note #2) Note that, for..." -> "(Note #2) For..."
>
> "(Note #4) Note that, for..." -> "(Note #4) For..."
Modified.

> 3.
>
> For "Note #3", use uppercase for the SQL keywords in the example.
Modified.

> 4.
>
> For "Note #4":
>
> "We created" -> "we created"
Modified.

> ======
>
> src/backend/catalog/aclchk.c
>
> 5. ExecuteGrantStmt
>
> @@ -385,7 +385,11 @@ ExecuteGrantStmt(GrantStmt *stmt)
> ereport(ERROR,
> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> errmsg("grantor must be current user")));
> +
> + istmt.grantor_uid = grantor;
> }
> + else
> + istmt.grantor_uid = InvalidOid;
>
> This code can be simpler by just declaring the 'grantor' variable at
> function scope, then assigning the istmt.grantor_uid along with the
> other grantor assignments.
>
> SUGGESTION
> Oid grantor = InvalidOid;
> ...
> istmt.grantor_uid = grantor;
> istmt.is_grant = stmt->is_grant;
> istmt.objtype = stmt->objtype;
Modified.

> ======
>
> src/backend/commands/collationcmds.c
>
> 6. DefineCollation
>
> + /* make from existing collationid available to callers */
> + if (from_collid && OidIsValid(collid))
> + ObjectAddressSet(*from_collid,
> + CollationRelationId,
> + collid);
>
> 6a.
> Maybe the param can be made 'from_existing_colid', then the above code
> comment can be made more readable?
Modified.

> 6b.
> Seems some unnecessary wrapping here
Modified.

> 7. convSpecifier
>
> typedef enum
> {
> SpecTypename,
> SpecOperatorname,
> SpecDottedName,
> SpecString,
> SpecNumber,
> SpecStringLiteral,
> SpecIdentifier,
> SpecRole
> } convSpecifier;
>
> Inconsistent case. Some of these say "name" and some say "Name"
Modified.

> 8. Forward declarations
>
> char *ddl_deparse_json_to_string(char *jsonb);
>
> Is this needed here? I thought this was already declared extern in
> ddl_deparse.h.
It is needed. We get the following warning without it:
ddl_json.c:704:1: warning: no previous prototype for
‘ddl_deparse_json_to_string’ [-Wmissing-prototypes]
ddl_deparse_json_to_string(char *json_str)

> 9. find_string_in_jsonbcontainer
>
> The function comment says "If it's of a type other than jbvString, an
> error is raised.", but I do not see this check in the function code.
Modified.

> 10. expand_fmt_recursive
>
> /*
> * Recursive helper for pg_event_trigger_expand_command
> *
> * Find the "fmt" element in the given container, and expand it into the
> * provided StringInfo.
> */
>
>
> 10a.
> I am not sure if the mention of "pg_event_trigger_expand_command" is
> stale or is not relevant anymore, because that caller is not in this
> module.
Modified.

> 10b.
> The first sentence is missing a period.
Modified.

> 11.
>
> value = findJsonbValueFromContainer(container, JB_FOBJECT, &key);
>
> Should this be checking is value is NULL?
The null checking for value is done in the upcoming call of
expand_one_jsonb_element().

> 12. expand_jsonval_dottedname
>
> * Expand a json value as a dot-separated-name. The value must be of type
> * object and may contain elements "schemaname" (optional), "objname"
> * (mandatory), "attrname" (optional). Double quotes are added to each element
> * as necessary, and dot separators where needed.
>
> The comment says "The value must be of type object" but I don't see
> any check/assert for that in the code.
The value must be of type binary, updated comment and added
Assert(jsonval→type == jbvBinary);

> 13. expand_jsonval_typename
>
> In other code (e.g. expand_jsonval_dottedname) there are lots of
> pfree(str) so why not similar here?
>
> e.g. Shouldn’t the end of the function have like shown below:
> pfree(schema);
> pfree(typename);
> pfree(typmodstr);
Modified.

> 14. expand_jsonval_operator
>
> The function comment is missing a period.
Modified.

> 15. expand_jsonval_string
>
> /*
> * Expand a JSON value as a string. The value must be of type string or of
> * type object. In the latter case, it must contain a "fmt" element which will
> * be recursively expanded; also, if the object contains an element "present"
> * and it is set to false, the expansion is the empty string.
>
> 15a.
> Although the comment says "The value must be of type string or of type
> object" the code is checking for jbvString and jbvBinary (??)
Updated the comment to “The value must be of type string or of type
binary”

> 15b.
> else
> return false;
>
> Is that OK to just return false, or should this in fact be throwing an
> error if the wrong type?
The caller checks the type is either jbvString or jbvBinary. Added comment
“The caller is responsible to check jsonval is of type jbvString or jbvBinary”.

> 16. expand_jsonval_strlit
>
> /* Easy case: if there are no ' and no \, just use a single quote */
> if (strchr(str, '\'') == NULL &&
> strchr(str, '\\') == NULL)
>
> That could be simplified as:
>
> if ((strpbk(str, "\'\\") == NULL)
Modified.

> 17. expand_jsonval_number
>
> strdatum = DatumGetCString(DirectFunctionCall1(numeric_out,
>
> NumericGetDatum(jsonval->val.numeric)));
> appendStringInfoString(buf, strdatum);
>
> Shouldn't this function do pfree(strdatum) at the end?
Modified.

> 18. expand_jsonval_role
>
> /*
> * Expand a JSON value as a role name. If the is_public element is set to
> * true, PUBLIC is expanded (no quotes); otherwise, expand the given role name,
> * quoting as an identifier.
> */
>
>
> Maybe better to quote that element name -> 'If the "is_public" element
> is set to true...'
I think we need to quote the non-public roles just in case they
contain special characters.

> 19. expand_one_jsonb_element
>
> The enum jbvType definition says that jbvBinary is a combination of
> array/object, so I am not sure if that should be reflected in the
> errmsg text (multiple places in this function body) instead of only
> saying "JSON object".
Updated errmsg texts to “JSON struct”.

> 20. ddl_deparse_expand_command
>
> * % expand to a literal %.
>
>
> Remove the period from that line (because not of the other specifier
> descriptions have one).
Modified.

> ======
>
> src/backend/utils/adt/regproc.c
>
> 21. format_procedure_args_internal
>
> +static void
> +format_procedure_args_internal(Form_pg_proc procform, StringInfo buf,
> + bool force_qualify)
> +{
> + int i;
> + int nargs = procform->pronargs;
>
> The 'nargs' var is used one time only, so hardly seems worth having it.
Modified.

> 22.
>
> + appendStringInfoString(buf,
> + force_qualify ?
> + format_type_be_qualified(thisargtype) :
> + format_type_be(thisargtype));
>
> 22a.
> Should these function results be assigned to a char* ptr so that they
> can be pfree(ptr) AFTER being appended to the 'buf'?
Modified.

> 22b.
> It's not really nececessary to check the force_qualify at every
> iteration. More effient to asign a function pointer outside this loop
> and just call that here. IIRC something like this:
>
> char * (*func[2])(Oid) = { format_type_be, format_type_be_qualified };
>
> ...
>
> then
> appendStringInfoString(buf, func[force_qualify](thisargtype))
Modified.

> src/backend/utils/adt/ruleutils.c
>
> 23. pg_get_ruledef_detailed
>
> Instead of the multiple if/else it might be easier to just assignup-front:
> *whereClause = NULL;
> *actions = NIL;
>
> Then the if blocks can just overwrite them.
>
> Also, if you do that, then I expect probably the 'output' temp list
> var is not needed at all.
Modified.

> 24. pg_get_viewdef_internal
>
> /*
> * In the case that the CREATE VIEW command execution is still in progress,
> * we would need to search the system cache RULERELNAME to get the rewrite
> * rule of the view as oppose to querying pg_rewrite as in
> pg_get_viewdef_worker(),
> * the latter will return empty result.
> */
>
> 24a.
> I'm not quite sure of the context of this function call. Maybe the
> comment was supposed to be worded more like below?
>
> "Because this function is called when CREATE VIEW command execution is
> still in progress, we need to search..."
Improved comment.

> 24b.
> "as oppose" -> "as opposed"
Modified.

> 25. pg_get_triggerdef_worker
>
> if (!isnull)
> {
> Node *qual;
> char *qualstr;
>
> qual = stringToNode(TextDatumGetCString(value));
> qualstr = pg_get_trigger_whenclause(trigrec, qual, pretty);
>
> appendStringInfo(&buf, "WHEN (%s) ", qualstr);
> }
>
> After appending the qualstr to buf, should there be a pfree(qualstr)?
I think we should skip pfree(qualstr) here since the memory is allocated
by initStringInfo in pg_get_trigger_whenclause, to avoid double free when
the StringInfoData in pg_get_trigger_whenclause gets freed.

> 26. pg_get_trigger_whenclause
>
> Missing function comment.
Added comment.

> 27. print_function_sqlbody
>
> -static void
> +void
> print_function_sqlbody(StringInfo buf, HeapTuple proctup)
> {
>
> Missing function comment. Probably having a function comment is more
> important now that this is not static?
Added comment.

> src/include/tcop/ddl_deparse.h
>
> 28.
>
> +extern char *deparse_utility_command(CollectedCommand *cmd, bool verbose_mode);
> +extern char *ddl_deparse_json_to_string(char *jsonb);
> +extern char *deparse_drop_command(const char *objidentity, const char
> *objecttype,
> + DropBehavior behavior);
>
> Function naming seems inconsistent. ('ddl_deparse_XXX' versus 'deparse_XXX').
modified.

Regards,
Zheng

Attachment	Content-Type	Size
v38-0004-Test-cases-for-DDL-replication.patch	application/octet-stream	24.6 KB
v38-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	application/octet-stream	15.0 KB
v38-0002-Support-DDL-replication.patch	application/octet-stream	133.5 KB
v38-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	319.7 KB

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-11-20 03:59:47
Message-ID:	CALDaNm3GRSBPAWAAeLeE8tBEfKkSH7zDx3gxk9ZPaA6NgARtZw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, 11 Nov 2022 at 11:03, Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> On Fri, Nov 11, 2022 at 4:17 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
> > On Fri, Nov 11, 2022 at 4:09 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > >
> > > On Fri, Nov 11, 2022 at 3:47 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > > >
> > > > Here are more review comments for the v32-0001 file ddl_deparse.c
> > > >
> > > > *** NOTE - my review post became too big, so I split it into smaller parts.
> > >
> >
>
> THIS IS PART 4 OF 4.
>
> =======
>
> src/backend/commands/ddl_deparse.c
>
> 99. deparse_CreateUserMappingStmt
>
> + /*
> + * Lookup up object in the catalog, so we don't have to deal with
> + * current_user and such.
> + */
>
> Typo "Lookup up"

Modified

> 100.
>
> + createStmt = new_objtree("CREATE USER MAPPING ");
> +
> + append_object_object(createStmt, "FOR %{role}R",
> new_objtree_for_role_id(form->umuser));
> +
> + append_string_object(createStmt, "SERVER %{server}I", server->servername);
>
> All this can be combined into a single VA() function call.

Modified

> 101.
>
> + /* add an OPTIONS clause, if any */
>
> Uppercase comment.

Modified

> 102. deparse_AlterUserMappingStmt
>
> + /*
> + * Lookup up object in the catalog, so we don't have to deal with
> + * current_user and such.
> + */
> +
> + tp = SearchSysCache1(USERMAPPINGOID, ObjectIdGetDatum(objectId));
>
> 102a.
> Typo "Lookup up"

Modified

> 102b.
> Unnecessary blank line.

Modified

> 103.
>
> + alterStmt = new_objtree("ALTER USER MAPPING");
> +
> + append_object_object(alterStmt, "FOR %{role}R",
> new_objtree_for_role_id(form->umuser));
> +
> + append_string_object(alterStmt, "SERVER %{server}I", server->servername);
>
> Can be combined into a single VA() function call.

Modified

> 104.
> + /* add an OPTIONS clause, if any */
>
> Uppercase comment

Modified

> 105. deparse_AlterStatsStmt
>
> + alterStat = new_objtree("ALTER STATISTICS");
> +
> + /* Lookup up object in the catalog */
> + tp = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(objectId));
> + if (!HeapTupleIsValid(tp))
> + elog(ERROR, "cache lookup failed for statistic %u", objectId);
> +
> + statform = (Form_pg_statistic_ext) GETSTRUCT(tp);
> +
> + append_object_object(alterStat, "%{identity}D",
> + new_objtree_for_qualname(statform->stxnamespace,
> + NameStr(statform->stxname)));
> +
> + append_float_object(alterStat, "SET STATISTICS %{target}n",
> node->stxstattarget);
>
> 105a.
> This was a biff unusual to put the new_objtree even before the catalog lookup.

Modified

> 105b.
> All new_objtreee and append_XXX can be combined as a single VA()
> function call here.

Modified

> 106. deparse_RefreshMatViewStmt
>
> + refreshStmt = new_objtree_VA("REFRESH MATERIALIZED VIEW", 0);
> +
> + /* Add a CONCURRENTLY clause */
> + append_string_object(refreshStmt, "%{concurrently}s",
> + node->concurrent ? "CONCURRENTLY" : "");
> + /* Add the matview name */
> + append_object_object(refreshStmt, "%{identity}D",
> + new_objtree_for_qualname_id(RelationRelationId,
> + objectId));
> + /* Add a WITH NO DATA clause */
> + tmp = new_objtree_VA("WITH NO DATA", 1,
> + "present", ObjTypeBool,
> + node->skipData ? true : false);
> + append_object_object(refreshStmt, "%{with_no_data}s", tmp);
>
> 106a.
> Don't use VA() function for 0 args.

This has been fixed already

> 106b.
> Huh? There are 2 different implementation styles here for the optional clauses
> - CONCURRENTLY just replaces with an empty string
> - WITH NOT DATA - has a new ObjTree either present/not present

I have not made any changes for this, we can handle together when we
are taking care of present/not present consistency across all

> 106c.
> Most/all of this can be combined into a single VA call.

Modified

> 107. deparse_DefElem
>
> + set = new_objtree("");
> + optname = new_objtree("");
> +
> + if (elem->defnamespace != NULL)
> + append_string_object(optname, "%{schema}I.", elem->defnamespace);
> +
> + append_string_object(optname, "%{label}I", elem->defname);
> +
> + append_object_object(set, "%{label}s", optname);
>
> The set should be created *after* the optname, then it can be done
> something like:
>
> set = new_objtree_VA("%{label}s", 1, "label", OptTyeString, optname);

Modified

> 108.
>
> + append_string_object(set, " = %{value}L",
> + elem->arg ? defGetString(elem) :
> + defGetBoolean(elem) ? "TRUE" : "FALSE");
>
> The calling code does not need to prefix the format with spaces like
> this. The append_XXX will handle space separators automatically.

Modified

> 109. deparse_drop_command
>
> + stmt = new_objtree_VA(fmt, 1, "objidentity", ObjTypeString, identity);
> + stmt2 = new_objtree_VA("CASCADE", 1,
> + "present", ObjTypeBool, behavior == DROP_CASCADE);
> +
> + append_object_object(stmt, "%{cascade}s", stmt2);
>
> 109a.
> 'stmt2' is a poor name. "CASCADE" is not a statement. Even 'tmpobj'

Modified

> 109b.
> The 2 lines of cascade should be grouped together -- i.e. the blank
> line should be *above* the "CASCADE", not below it.

Modified

> 110. deparse_FunctionSet
>
> + obj = new_objtree("RESET");
> + append_string_object(obj, "%{set_name}I", name);
>
> This can be combined as a single VA() call with a format "RESET %{set_name}I".

Modified

> 111.
>
> + if (kind == VAR_RESET_ALL)
> + {
> + obj = new_objtree("RESET ALL");
> + }
> + else if (value != NULL)
>
>
> It seems a bit strange that the decision is judged sometimes by the
> *value*. Why isn’t this just deciding according to different
> VariableSetKind (e.g. VAR_SET_VALUE)

Modified

> 112. deparse_IndexStmt
>
> + indexStmt = new_objtree("CREATE");
> +
> + append_string_object(indexStmt, "%{unique}s",
> + node->unique ? "UNIQUE" : "");
> +
> + append_format_string(indexStmt, "INDEX");
> +
> + append_string_object(indexStmt, "%{concurrently}s",
> + node->concurrent ? "CONCURRENTLY" : "");
> +
> + append_string_object(indexStmt, "%{if_not_exists}s",
> + node->if_not_exists ? "IF NOT EXISTS" : "");
> +
> + append_string_object(indexStmt, "%{name}I",
> + RelationGetRelationName(idxrel));
> +
> + append_object_object(indexStmt, "ON %{table}D",
> + new_objtree_for_qualname(heaprel->rd_rel->relnamespace,
> + RelationGetRelationName(heaprel)));
> +
> + append_string_object(indexStmt, "USING %{index_am}s", index_am);
> +
> + append_string_object(indexStmt, "(%{definition}s)", definition);
>
> This could all be combined to a single VA() function call.

Modified

> 113. deparse_OnCommitClause
>
> + case ONCOMMIT_NOOP:
> + append_null_object(oncommit, "%{on_commit_value}s");
> + append_bool_object(oncommit, "present", false);
> + break;
>
> Why is the null object necessary when the entire "ON COMMIT" is present=false?

This code was intended to generate a verbose json node for "ON
COMMIT". So that user can easily modify the command by changing the
value of ON COMMIT to generate a new ddl command.

> 114. deparse_RenameStmt
>
> + renameStmt = new_objtree_VA(fmtstr, 0);
> + append_string_object(renameStmt, "%{if_exists}s",
> + node->missing_ok ? "IF EXISTS" : "");
> + append_object_object(renameStmt, "%{identity}D",
> + new_objtree_for_qualname(schemaId,
> + node->relation->relname));
> + append_string_object(renameStmt, "RENAME TO %{newname}I",
> + node->newname);
>
> 114a.
> Don't use VA() for 0 args.

This was already fixed.

> 114b.
> Anyway, all these can be combined to a single new_objtree_VA() call.

Modified

> 115.
>
> + renameStmt = new_objtree_VA("ALTER POLICY", 0);
> + append_string_object(renameStmt, "%{if_exists}s",
> + node->missing_ok ? "IF EXISTS" : "");
> + append_string_object(renameStmt, "%{policyname}I", node->subname);
> + append_object_object(renameStmt, "ON %{identity}D",
> + new_objtree_for_qualname_id(RelationRelationId,
> + polForm->polrelid));
> + append_string_object(renameStmt, "RENAME TO %{newname}I",
> + node->newname);
>
> All these can be combined into a single VA() call.

Modified

> 116.
>
> relation_close(pg_policy, AccessShareLock);
>
> }
> break;
>
> case OBJECT_ATTRIBUTE:
>
> Spurious blank line before the }

Modified

> 117.
>
> + objtype = stringify_objtype(node->relationType);
> + fmtstr = psprintf("ALTER %s", objtype);
> + renameStmt = new_objtree(fmtstr);
>
> The code seems over-complicated. All these temporary assignments are
> not really necessary.
>
> Maybe better remove the psprintf anyway, as per my general comment at
> top of this review post.

Here psprintf cannot be removed because node->relationType is not a
fixed type, variable string cannot be used in fmr argument of
new_objtree_VA. However objtype can be removed, I have removed it.

> 118.
>
> + relation_close(relation, AccessShareLock);
> +
> + break;
> + case OBJECT_FUNCTION:
>
>
> The misplaced blank line should be *after* the break; not before it.

Modified

> 119.
>
> + char *fmt;
> +
> + fmt = psprintf("ALTER %s %%{identity}D USING %%{index_method}s
> RENAME TO %%{newname}I",
> + stringify_objtype(node->renameType));
>
> Let's be consistent about the variable naming at least within the same
> function. Elsewhere was 'fmt' was 'fmtstr' so make them all the same
> (pick one).

Removed fmt variable and used the existing fmtstr variable

> 120.
>
> + objtype = stringify_objtype(node->renameType);
> + fmtstring = psprintf("ALTER %s", objtype);
> +
> + renameStmt = new_objtree_VA(fmtstring,
> + 0);
> + append_object_object(renameStmt, "%{identity}D",
> + new_objtree_for_qualname(DatumGetObjectId(objnsp),
> + strVal(llast(identity))));
> +
> + append_string_object(renameStmt, "RENAME TO %{newname}I",
> + node->newname);
>
> 120a.
> Simplify this by not going the assignment to 'objtype'

Modified

> 120b.
> All this can be combined as a single VA() call.

Modified

> 121. deparse_AlterDependStmt
>
> +deparse_AlterDependStmt(Oid objectId, Node *parsetree)
> +{
> + AlterObjectDependsStmt *node = (AlterObjectDependsStmt *) parsetree;
> + ObjTree *alterDependeStmt = NULL;
> +
> +
> + if (node->objectType == OBJECT_INDEX)
>
> Double blank lines?

Modified

> 122.
>
> + alterDependeStmt = new_objtree("ALTER INDEX");
> +
> + qualified = new_objtree_for_qualname(relation->rd_rel->relnamespace,
> + node->relation->relname);
> + append_object_object(alterDependeStmt, "%{identity}D", qualified);
>
> This could be combined into a single VA() call.
>
> In, fact everything could be if the code it refactored a bit better so
> only the assignment for 'qualified' was within the lock.
>
> SUGGESTION
>
> qualified = new_objtree_for_qualname(relation->rd_rel->relnamespace,
> node->relation->relname);
> relation_close(relation, AccessShareLock);
>
> stmt = new_objtree_VA("ALTER INDEX %{identity}D %{no}s DEPENDS
> ON EXTENSION %{newname}I", 3,
> "identity", ObjTypeObject, qualified,
> "no", ObjTypeString, node->remove ? "NO" : "",
> "newname", strVal(node->extname));

Modified

> 123.
>
> + append_string_object(alterDependeStmt, "%{NO}s",
> + node->remove ? "NO" : "");
>
> IMO it seemed more conventional for the substition marker to be
> lowercase. So this should say "%{no}s" instead.

Modified

> 124.
>
> AlterObjectDependsStmt *node = (AlterObjectDependsStmt *) parsetree;
> ObjTree *alterDependeStmt = NULL;
>
> Why 'alterDependeStmt' with the extra 'e' -- Is it a typo? Anyway, the
> name seems overkill - just 'stmt' would put be fine.

Modified

> 125. GENERAL comments for all the deparse_Seq_XXX functions
>
> Comments common for:
> - deparse_Seq_Cache
> - deparse_Seq_Cycle
> - deparse_Seq_IncrementBy
> - deparse_Seq_Maxvalue
> - deparse_Seq_Minvalue
> - deparse_Seq_OwnedBy
> - deparse_Seq_Restart
> - deparse_Seq_Startwith
>
> 125a
> Most of the deparse_Seq_XXX functions are prefixed with "SET" which is
> needed for ALTER TABLE only.
>
> e.g.
>
> if (alter_table)
> fmt = "SET %{no}s CYCLE";
> else
> fmt = "%{no}s CYCLE";
>
> IMO all these "SET" additions should be done at the point of the call
> when doing the ALTER TABLE instead of polluting all these helper
> functions. Remove the alter_table function parameter.

In this case we have to create a format string and create an object
tree, since SET is part of the format string even if we remove
alter_table, we might have to pass SET as format string in that case
or we might have to duplicate these deparse_XXX functions for alter
table case. I preferred the existing approach unless there is an
easier way.

> 125b.
> IMO it would be tidier with a blank line before the returns.

Modified

> 125c.
> The function parameter *parent is unused.

Modified

> 126. deparse_RuleStmt
>
> + ruleStmt = new_objtree("CREATE RULE");
> +
> + append_string_object(ruleStmt, "%{or_replace}s",
> + node->replace ? "OR REPLACE" : "");
> +
> + append_string_object(ruleStmt, "%{identity}I",
> + node->rulename);
> +
> + append_string_object(ruleStmt, "AS ON %{event}s",
> + node->event == CMD_SELECT ? "SELECT" :
> + node->event == CMD_UPDATE ? "UPDATE" :
> + node->event == CMD_DELETE ? "DELETE" :
> + node->event == CMD_INSERT ? "INSERT" : "XXX");
> + append_object_object(ruleStmt, "TO %{table}D",
> + new_objtree_for_qualname_id(RelationRelationId,
> + rewrForm->ev_class));
> +
> + append_string_object(ruleStmt, "DO %{instead}s",
> + node->instead ? "INSTEAD" : "ALSO");
>
> I suspect all of this can be combined to be a single VA() function call.

Modified

> 127.
>
> + append_string_object(ruleStmt, "AS ON %{event}s",
> + node->event == CMD_SELECT ? "SELECT" :
> + node->event == CMD_UPDATE ? "UPDATE" :
> + node->event == CMD_DELETE ? "DELETE" :
> + node->event == CMD_INSERT ? "INSERT" : "XXX");
>
> The bogus "XXX" looks a bit dodgy. Probably it would be better to
> assign this 'event_str' separately and Assert/Error if node->event is
> not supported.

Modified

> 128.
>
> + tmp = new_objtree_VA("WHERE %{clause}s", 0);
> +
> + if (qual)
> + append_string_object(tmp, "clause", qual);
> + else
> + {
> + append_null_object(tmp, "clause");
> + append_bool_object(tmp, "present", false);
> + }
> +
> + append_object_object(ruleStmt, "where_clause", tmp);
>
> This doesn't look right to me...
>
> 128a.
> Using VA() with 0 args

Modified

> 128b.
> Using null object to fudge substitution to "%{clause}s, is avoidable IMO

This code was intended to generate a verbose json node for "where
clause". So that user can easily modify the command by changing the
value of where clause to generate a new ddl.

> 128c.
> Shouldn't there be a "%{where_clause}s" on the ruleStmt format?

Modified

> 129. deparse_CreateTransformStmt
>
> + createTransform = new_objtree("CREATE");
> +
> + append_string_object(createTransform, "%{or_replace}s",
> + node->replace ? "OR REPLACE" : "");
> + append_object_object(createTransform, "TRANSFORM FOR %{typename}D",
> + new_objtree_for_qualname_id(TypeRelationId,
> + trfForm->trftype));
> + append_string_object(createTransform, "LANGUAGE %{language}I",
> + NameStr(langForm->lanname));
>
> This can all be combined into a single VA() function.

Modified

> 130.
> + /* deparse the transform_element_list */
> + if (trfForm->trffromsql != InvalidOid)
>
> 130a.
> Uppercase comment

Modified

> 130b.
> Use OidIsValid macro.

Modified

> 131.
>
> + /*
> + * Verbose syntax
> + *
> + * CREATE %{or_replace}s TRANSFORM FOR %{typename}D LANGUAGE
> + * %{language}I ( FROM SQL WITH FUNCTION %{signature}s, TO SQL WITH
> + * FUNCTION %{signature_tof}s )
> + *
> + * OR
> + *
> + * CREATE %{or_replace}s TRANSFORM FOR %{typename}D LANGUAGE
> + * %{language}I ( TO SQL WITH FUNCTION %{signature_tof}s )
> + */
> +
>
> According to the PG DOCS [3] *either* part of FROM/TO SQL WITH
> FUNCTION are optional. So a "FROM SQL" without a "TO SQL" is also
> allowed. So the comment should say this too.

Verbose syntax has been mentioned 3 times, I felt it is not required
to mention again and again. I have retained it at the beginning and
remained the others.

> 132.
>
> There are multiple other places in this code where should use OidIsValid macro.
>
> e.g.
> + if (trfForm->trftosql != InvalidOid)
>
> e.g.
> + /* Append a ',' if trffromsql is present, else append '(' */
> + append_string_object(createTransform, "%{comma}s",
> + trfForm->trffromsql != InvalidOid ? "," : "(");

Modified

> 133.
> These strange substitutions could've just use the
> append_format_string() function I think.
>
> 133a
> + /* Append a ',' if trffromsql is present, else append '(' */
> + append_string_object(createTransform, "%{comma}s",
> + trfForm->trffromsql != InvalidOid ? "," : "(");
>
> SUGGESTION
> append_format_string(createTransform, OidIsValid( trfForm->trffromsql)
> "," : "(");

Modified

> 133b.
> + append_string_object(createTransform, "%{close_bracket}s", ")");
>
> SUGGESTION
> append_format_string(createTransform, ")");

Modified

> 134.
> + sign = new_objtree("");
> +
> + append_object_object(sign, "%{identity}D",
> + new_objtree_for_qualname(procForm->pronamespace,
> + NameStr(procForm->proname)));
> + append_array_object(sign, "(%{arguments:, }s)", params);
> +
> + append_object_object(createTransform, "TO SQL WITH FUNCTION
> %{signature_tof}s", sign);
>
> 134a.
> IIUC it's a bit clunky to parse out this whole fmt looking for a '{'
> to extract the name "signature_tof" (maybe it works but there is a lot
> of ineficiency hidden under the covers I think), when with some small
> refactoring this could be done using a VA() function passing in the
> known name.

Modified

> 134b.
> Looks like 'sign' is either a typo or very misleading name. Isn't that
> supposed to be the ObjTree for the "signature_tof"?

Changed it to signature

> 135. append_literal_or_null
>
> +static void
> +append_literal_or_null(ObjTree *mainobj, char *elemname, char *value)
>
> In other functions 'mainobj' would have been called 'parent'. I think
> parent is a better name.

Modified

> 136.
>
> + top = new_objtree_VA("", 0);
>
> Don't use VA() for 0 args.

It was already fixed

> 137.
>
> + top = new_objtree_VA("", 0);
> + part = new_objtree_VA("NULL", 1,
> + "present", ObjTypeBool,
> + !value);
> + append_object_object(top, "%{null}s", part);
> + part = new_objtree_VA("", 1,
> + "present", ObjTypeBool,
> + !!value);
> + if (value)
> + append_string_object(part, "%{value}L", value);
> + append_object_object(top, "%{literal}s", part);
>
> 137a.
> Suggest to put each VA arg name/value on the same line.
> e.g.
> + part = new_objtree_VA("NULL", 1,
> + "present", ObjTypeBool, !value);

Modified

> 137b.
> The '!!' is cute but seems uncommon technique in PG sources. Maybe
> better just say value != NULL

Modified

> 137c.
> Suggest adding a blank line to separate the logic of the 2 parts.
> (e.g. above the 2nd part = new_objtree_VA).

Modified

> 138. deparse_CommentOnConstraintSmt
>
> + comment = new_objtree("COMMENT ON CONSTRAINT");
> +
> + append_string_object(comment, "%{identity}s",
> pstrdup(NameStr(constrForm->conname)));
> + append_format_string(comment, "ON");
> +
> + if (node->objtype == OBJECT_DOMCONSTRAINT)
> + append_format_string(comment, "DOMAIN");
> +
> + append_string_object(comment, "%{parentobj}s",
> + getObjectIdentity(&addr, false));
>
> This can mostly be done as a single VA() call I think.

Modified

> 139. deparse_CommentStmt
>
> +
> +static ObjTree *
> +deparse_CommentStmt(ObjectAddress address, Node *parsetree)
>
> Missing function comment.

Modified

> 140.
>
> + comment = new_objtree("COMMENT ON");
> + append_string_object(comment, "%{objtype}s", (char *)
> stringify_objtype(node->objtype));
>
> A single VA() function call can do this.

Modified

> 141. deparse_CreateAmStmt
>
> +
> +static ObjTree *
> +deparse_CreateAmStmt(Oid objectId, Node *parsetree)
>
> Missing function comment.

Modified

> 142.
>
> + CreateAm = new_objtree("CREATE ACCESS METHOD");
> + append_string_object(CreateAm, "%{identity}I",
> + NameStr(amForm->amname));
> +
> + switch (amForm->amtype)
> + {
> + case 'i':
> + amtype = "INDEX";
> + break;
> + case 't':
> + amtype = "TABLE";
> + break;
> + default:
> + elog(ERROR, "invalid type %c for access method", amForm->amtype);
> + }
> + append_string_object(CreateAm, "TYPE %{am_type}s", amtype);
> +
> + /* Add the HANDLER clause */
> + append_object_object(CreateAm, "HANDLER %{handler}D",
> + new_objtree_for_qualname_id(ProcedureRelationId,
> + amForm->amhandler));
>
> This entire thing can be done as a single VA() function call.
>
> SUGGESTION
>
> switch (amForm->amtype)
> {
> case 'i':
> amtype = "INDEX";
> break;
> case 't':
> amtype = "TABLE";
> break;
> default:
> elog(ERROR, "invalid type %c for access method", amForm->amtype);
> }
>
> createAm = new_objtree_VA("CREATE ACCESS METHOD %{identity}I TYPE
> %{am_type}s HANDLER %{handler}D", 3,
> "identity", ObjTypeString, NameStr(amForm->amname),
> "am_type", ObjTypeString, amtype,
> "handler", ObjTypeObject,
> new_objtree_for_qualname_id(ProcedureRelationId, amForm->amhandler));

Modified

> 143. deparse_simple_command
>
> + switch (nodeTag(parsetree))
> + {
> + case T_CreateSchemaStmt:
> + command = deparse_CreateSchemaStmt(objectId, parsetree);
> + break;
> +
> + case T_AlterDomainStmt:
> + command = deparse_AlterDomainStmt(objectId, parsetree,
> + cmd->d.simple.secondaryObject);
> + break;
> +
> + case T_CreateStmt:
> + command = deparse_CreateStmt(objectId, parsetree);
> + break;
> +
> + case T_RefreshMatViewStmt:
> + command = deparse_RefreshMatViewStmt(objectId, parsetree);
> + break;
> +
> + case T_CreateTrigStmt:
> + command = deparse_CreateTrigStmt(objectId, parsetree);
> + break;
> +
> + case T_RuleStmt:
> + command = deparse_RuleStmt(objectId, parsetree);
> + break;
> +
> + case T_CreatePLangStmt:
> + command = deparse_CreateLangStmt(objectId, parsetree);
> + break;
> +
> + case T_CreateSeqStmt:
> + command = deparse_CreateSeqStmt(objectId, parsetree);
> + break;
> +
> + case T_CreateFdwStmt:
> + command = deparse_CreateFdwStmt(objectId, parsetree);
> + break;
> +
> + case T_CreateUserMappingStmt:
> + command = deparse_CreateUserMappingStmt(objectId, parsetree);
> + break;
> +
> + case T_AlterUserMappingStmt:
> + command = deparse_AlterUserMappingStmt(objectId, parsetree);
> + break;
> +
> + case T_AlterStatsStmt:
> + command = deparse_AlterStatsStmt(objectId, parsetree);
> + break;
> +
> + case T_AlterFdwStmt:
> + command = deparse_AlterFdwStmt(objectId, parsetree);
> + break;
> +
> + case T_AlterSeqStmt:
> + command = deparse_AlterSeqStmt(objectId, parsetree);
> + break;
> +
> + case T_DefineStmt:
> + command = deparse_DefineStmt(objectId, parsetree,
> + cmd->d.simple.secondaryObject);
> + break;
> +
> + case T_CreateConversionStmt:
> + command = deparse_CreateConversion(objectId, parsetree);
> + break;
> +
> + case T_CreateDomainStmt:
> + command = deparse_CreateDomain(objectId, parsetree);
> + break;
> +
> + case T_CreateExtensionStmt:
> + command = deparse_CreateExtensionStmt(objectId, parsetree);
> + break;
> +
> + case T_AlterExtensionStmt:
> + command = deparse_AlterExtensionStmt(objectId, parsetree);
> + break;
> +
> + case T_AlterExtensionContentsStmt:
> + command = deparse_AlterExtensionContentsStmt(objectId, parsetree,
> + cmd->d.simple.secondaryObject);
> + break;
> +
> + case T_CreateOpFamilyStmt:
> + command = deparse_CreateOpFamily(objectId, parsetree);
> + break;
> +
> + case T_CreatePolicyStmt:
> + command = deparse_CreatePolicyStmt(objectId, parsetree);
> + break;
> +
> + case T_IndexStmt:
> + command = deparse_IndexStmt(objectId, parsetree);
> + break;
> +
> + case T_CreateFunctionStmt:
> + command = deparse_CreateFunction(objectId, parsetree);
> + break;
> +
> + case T_AlterFunctionStmt:
> + command = deparse_AlterFunction(objectId, parsetree);
> + break;
> +
> + case T_AlterCollationStmt:
> + command = deparse_AlterCollation(objectId, parsetree);
> + break;
> +
> + case T_RenameStmt:
> + command = deparse_RenameStmt(cmd->d.simple.address, parsetree);
> + break;
> +
> + case T_AlterObjectDependsStmt:
> + command = deparse_AlterDependStmt(objectId, parsetree);
> + break;
> +
> + case T_AlterObjectSchemaStmt:
> + command = deparse_AlterObjectSchemaStmt(cmd->d.simple.address,
> + parsetree,
> + cmd->d.simple.secondaryObject);
> + break;
> +
> + case T_AlterOwnerStmt:
> + command = deparse_AlterOwnerStmt(cmd->d.simple.address, parsetree);
> + break;
> +
> + case T_AlterOperatorStmt:
> + command = deparse_AlterOperatorStmt(objectId, parsetree);
> + break;
> +
> + case T_AlterPolicyStmt:
> + command = deparse_AlterPolicyStmt(objectId, parsetree);
> + break;
> +
> + case T_AlterTypeStmt:
> + command = deparse_AlterTypeSetStmt(objectId, parsetree);
> + break;
> +
> + case T_CreateStatsStmt:
> + command = deparse_CreateStatisticsStmt(objectId, parsetree);
> + break;
> +
> + case T_CreateForeignServerStmt:
> + command = deparse_CreateForeignServerStmt(objectId, parsetree);
> + break;
> +
> + case T_AlterForeignServerStmt:
> + command = deparse_AlterForeignServerStmt(objectId, parsetree);
> + break;
> +
> + case T_CompositeTypeStmt:
> + command = deparse_CompositeTypeStmt(objectId, parsetree);
> + break;
> +
> + case T_CreateEnumStmt: /* CREATE TYPE AS ENUM */
> + command = deparse_CreateEnumStmt(objectId, parsetree);
> + break;
> +
> + case T_CreateRangeStmt: /* CREATE TYPE AS RANGE */
> + command = deparse_CreateRangeStmt(objectId, parsetree);
> + break;
> +
> + case T_AlterEnumStmt:
> + command = deparse_AlterEnumStmt(objectId, parsetree);
> + break;
> +
> + case T_CreateCastStmt:
> + command = deparse_CreateCastStmt(objectId, parsetree);
> + break;
> +
> + case T_AlterTSDictionaryStmt:
> + command = deparse_AlterTSDictionaryStmt(objectId, parsetree);
> + break;
> +
> + case T_CreateTransformStmt:
> + command = deparse_CreateTransformStmt(objectId, parsetree);
> + break;
> +
> + case T_ViewStmt: /* CREATE VIEW */
> + command = deparse_ViewStmt(objectId, parsetree);
> + break;
> +
> + case T_CreateTableAsStmt: /* CREATE MATERIALIZED VIEW */
> + command = deparse_CreateTableAsStmt_vanilla(objectId, parsetree);
> + break;
> +
> + case T_CommentStmt:
> + command = deparse_CommentStmt(cmd->d.simple.address, parsetree);
> + break;
> +
> + case T_CreateAmStmt:
> + command = deparse_CreateAmStmt(objectId, parsetree);
> + break;
>
> 143a.
> Suggestion to put all these cases in alphabetical order.

Modified

> 143b.
> Suggest removing the variable 'command' and for each case just return
> the deparse_XXX result -- doing this will eliminate the need for
> "break;" and so the function can be 50 lines shorter.

Modified

> 144. deparse_TableElements
>
> + if (tree != NULL)
> + {
> + ObjElem *column;
> +
> + column = new_object_object(tree);
> + elements = lappend(elements, column);
> + }
>
> Why do all this instead of just:
>
> if (tree != NULL)
> elements = lappend(elements, new_object_object(tree));

Modified

> 145. deparse_utility_command
>
> + if (tree)
> + {
> + Jsonb *jsonb;
> +
> + jsonb = objtree_to_jsonb(tree);
> + command = JsonbToCString(&str, &jsonb->root, JSONB_ESTIMATED_LEN);
> + }
> + else
> + command = NULL;
>
> 145a.
> Since 'tree' is always assigned the result of deparse_XXX I am
> wondering if tree == NULL is even possible here? If not then this
> if/else should be an Assert instead.

This is required as the tree can be NULL like in the below case:
/*
* Indexes for PRIMARY KEY and other constraints are output
* separately; return empty here.
*/

> 145b.
> Anyway, maybe assign default command = NULL in the declaration to
> reduce a couple of lines of unnecessary code.

Modified

Thanks for the comments, the attached v39 patch has the changes for the same.

Regards,
Vignesh

Attachment	Content-Type	Size
v39-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	text/x-patch	15.0 KB
v39-0004-Test-cases-for-DDL-replication.patch	text/x-patch	24.6 KB
v39-0001-Functions-to-deparse-DDL-commands.patch	text/x-patch	317.0 KB
v39-0002-Support-DDL-replication.patch	text/x-patch	133.5 KB

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-11-25 06:17:59
Message-ID:	CALDaNm3YmSHN7-8dfnZ=0B_uM8yRb3O93o+jY0z8KAKqqxy_og@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Sun, 20 Nov 2022 at 09:29, vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Fri, 11 Nov 2022 at 11:03, Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
> > On Fri, Nov 11, 2022 at 4:17 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > >
> > > On Fri, Nov 11, 2022 at 4:09 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > > >
> > > > On Fri, Nov 11, 2022 at 3:47 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > > > >
> > > > > Here are more review comments for the v32-0001 file ddl_deparse.c
> > > > >
> > > > > *** NOTE - my review post became too big, so I split it into smaller parts.
> > > >
> > >
> >
> > THIS IS PART 4 OF 4.
> >
> > =======
> >
> > src/backend/commands/ddl_deparse.c
>
> Thanks for the comments, the attached v39 patch has the changes for the same.

One comment:
While fixing review comments, I found that default syntax is not
handled for create domain:
+ /*
+ * Verbose syntax
+ *
+ * CREATE DOMAIN %{identity}D AS %{type}T %{not_null}s %{constraints}s
+ * %{collation}s
+ */
+ createDomain = new_objtree("CREATE");
+
+ append_object_object(createDomain,
+ "DOMAIN %{identity}D AS",
+
new_objtree_for_qualname_id(TypeRelationId,
+
objectId));
+ append_object_object(createDomain,
+ "%{type}T",
+
new_objtree_for_type(typForm->typbasetype, typForm->typtypmod));

Regards,
Vignesh

From:	li jie <ggysxcq(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-11-25 09:06:44
Message-ID:	CAGfChW4gijmx19d0qwJaRctgVRmjN_YvyQTCs48TOFKTF_BGYQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Hi Developer，

I have been following this patch for a long time.
Recently, I started to try to test it. I found several bugs
here and want to give you feedback.

1. CREATE TABLE LIKE
I found that this case may be repication incorrectly.
You can run the following SQL statement:
```
CREATE TABLE ctlt1 (a text CHECK (length(a) > 2) PRIMARY KEY, b text);
ALTER TABLE ctlt1 ALTER COLUMN a SET STORAGE MAIN;
ALTER TABLE ctlt1 ALTER COLUMN b SET STORAGE EXTERNAL;
CREATE TABLE ctlt1_like (LIKE ctlt1 INCLUDING ALL);
```
The ctlt1_like table will not be able to correct the replication.
I think this is because create table like statement is captured by
the event trigger to a create table statement and multiple alter table
statements.
There are some overlaps between them, and an error is reported when
downstream replication occurs.

2. ALTER TABLE (inherits)
case:
```
CREATE TABLE gtest30 (
a int,
b int GENERATED ALWAYS AS (a * 2) STORED
);
CREATE TABLE gtest30_1 () INHERITS (gtest30);
ALTER TABLE gtest30 ALTER COLUMN b DROP EXPRESSION;
```
After this case is executed in the publication, the following error occurs
in the subscription :

ERROR: column "b" of relation "gtest30" is not a stored generated column
STATEMENT: ALTER TABLE public.gtest30 ALTER COLUMN b DROP EXPRESSION, ALTER
COLUMN b DROP EXPRESSION

Obviously, the column modifications of the inherited table were also
captured,

and then deparse the wrong statement.

I believe that such errors may also occur in other alter table subcmd
scenarios where tables are inherited.

3. ALTER TABLE SET STATISTICS

case:

```
CREATE TABLE test_stat (a int);
ALTER TABLE test_stat ALTER a SET STATISTICS -1;

```

After this case is executed in the publication, the following error occurs
in the subscription :

syntax error at or near "4294967295" at character 60
STATEMENT: ALTER TABLE public.test_stat ALTER COLUMN a SET STATISTICS
4294967295

I guess this should be an overflow in the integer conversion process.

4. json null string coredump

case:

```
CREATE OR REPLACE FUNCTION test_ddl_deparse_full()
RETURNS event_trigger LANGUAGE plpgsql AS
$$
DECLARE
r record;
deparsed_json text;
BEGIN
FOR r IN SELECT * FROM pg_event_trigger_ddl_commands()
LOOP
deparsed_json = ddl_deparse_to_json(r.command);
RAISE NOTICE 'deparsed json: %', deparsed_json;
RAISE NOTICE 're-formed command: %',
ddl_deparse_expand_command(deparsed_json);
END LOOP;
END;
$$;

CREATE EVENT TRIGGER test_ddl_deparse_full
ON ddl_command_end EXECUTE PROCEDURE test_ddl_deparse_full();

CREATE SCHEMA AUTHORIZATION postgres;

```

If the preceding case is executed, coredump occurs,

which is related to null string and can be reproduced.

I hope these feedbacks can be helpful to you.

We sincerely wish you complete the ddl Logical replication feature.

Regards, Adger

vignesh C <vignesh21(at)gmail(dot)com> 于2022年11月25日周五 14:18写道：

> On Sun, 20 Nov 2022 at 09:29, vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > On Fri, 11 Nov 2022 at 11:03, Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > >
> > > On Fri, Nov 11, 2022 at 4:17 PM Peter Smith <smithpb2250(at)gmail(dot)com>
> wrote:
> > > >
> > > > On Fri, Nov 11, 2022 at 4:09 PM Peter Smith <smithpb2250(at)gmail(dot)com>
> wrote:
> > > > >
> > > > > On Fri, Nov 11, 2022 at 3:47 PM Peter Smith <smithpb2250(at)gmail(dot)com>
> wrote:
> > > > > >
> > > > > > Here are more review comments for the v32-0001 file ddl_deparse.c
> > > > > >
> > > > > > *** NOTE - my review post became too big, so I split it into
> smaller parts.
> > > > >
> > > >
> > >
> > > THIS IS PART 4 OF 4.
> > >
> > > =======
> > >
> > > src/backend/commands/ddl_deparse.c
> >
> > Thanks for the comments, the attached v39 patch has the changes for the
> same.
>
> One comment:
> While fixing review comments, I found that default syntax is not
> handled for create domain:
> + /*
> + * Verbose syntax
> + *
> + * CREATE DOMAIN %{identity}D AS %{type}T %{not_null}s
> %{constraints}s
> + * %{collation}s
> + */
> + createDomain = new_objtree("CREATE");
> +
> + append_object_object(createDomain,
> + "DOMAIN %{identity}D AS",
> +
> new_objtree_for_qualname_id(TypeRelationId,
> +
> objectId));
> + append_object_object(createDomain,
> + "%{type}T",
> +
> new_objtree_for_type(typForm->typbasetype, typForm->typtypmod));
>
> Regards,
> Vignesh
>
>
>

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-11-25 10:30:39
Message-ID:	CALDaNm1WEnw2Oykb90PO1c4oDAVrAR+16W8Cm_F-KzgNvqmmKg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, 11 Nov 2022 at 10:39, Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> On Fri, Nov 11, 2022 at 3:47 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
> > Here are more review comments for the v32-0001 file ddl_deparse.c
> >
> > *** NOTE - my review post became too big, so I split it into smaller parts.
>
> THIS IS PART 2 OF 4.
>
> =======
>
> src/backend/commands/ddl_deparse.c
>
> 1. deparse_AlterExtensionStmt
>
> +/*
> + * Deparse an AlterExtensionStmt (ALTER EXTENSION .. UPDATE TO VERSION)
> + *
> + * Given an extension OID and a parse tree that modified it, return an ObjTree
> + * representing the alter type.
> + */
> +static ObjTree *
> +deparse_AlterExtensionStmt(Oid objectId, Node *parsetree)
>
> Spurious blank space before "OID"

Modified

> 2.
>
> + ObjTree *stmt;
> + ObjTree *tmp;
> + List *list = NIL;
> + ListCell *cell;
>
> Variable 'tmp' can be declared only in the scope that it is used.

Modified

> 3.
>
> + foreach(cell, node->options)
> + {
> + DefElem *opt = (DefElem *) lfirst(cell);
> +
> + if (strcmp(opt->defname, "new_version") == 0)
> + {
> + tmp = new_objtree_VA("TO %{version}L", 2,
> + "type", ObjTypeString, "version",
> + "version", ObjTypeString, defGetString(opt));
> + list = lappend(list, new_object_object(tmp));
> + }
> + else
> + elog(ERROR, "unsupported option %s", opt->defname);
> + }
>
> This code seems strange to be adding new versions to a list. How can
> there be multiple new versions? It does not seem compatible with the
> command syntax [1]

Modified

> 4. deparse_CreateCastStmt
>
> + initStringInfo(&func);
> + appendStringInfo(&func, "%s(",
> + quote_qualified_identifier(get_namespace_name(funcForm->pronamespace),
> + NameStr(funcForm->proname)));
> + for (i = 0; i < funcForm->pronargs; i++)
> + appendStringInfoString(&func,
> + format_type_be_qualified(funcForm->proargtypes.values[i]));
> + appendStringInfoChar(&func, ')');
>
> Is this correct, or should there be some separators (e.g. commas)
> between multiple arg-types?

Modified

> 5. deparse_AlterDefaultPrivilegesStmt
>
> +
> +static ObjTree *
> +deparse_AlterDefaultPrivilegesStmt(CollectedCommand *cmd)
>
> Missing function comment

Modified

> 6.
>
> + schemas = lappend(schemas,
> + new_string_object(strVal(val)));
>
> Unnecessary wrapping.

Modified

> 7.
>
> + /* Add the IN SCHEMA clause, if any */
> + tmp = new_objtree("IN SCHEMA");
> + append_array_object(tmp, "%{schemas:, }I", schemas);
> + if (schemas == NIL)
> + append_bool_object(tmp, "present", false);
> + append_object_object(alterStmt, "%{in_schema}s", tmp);
> +
> + /* Add the FOR ROLE clause, if any */
> + tmp = new_objtree("FOR ROLE");
> + append_array_object(tmp, "%{roles:, }R", roles);
> + if (roles == NIL)
> + append_bool_object(tmp, "present", false);
> + append_object_object(alterStmt, "%{for_roles}s", tmp);
>
>
> I don't really understand why the logic prefers to add a whole new
> empty tree with "present: false" versus just adding nothing at all
> unless it is relevant.

This code was intended to generate a verbose json node. So that user
can easily modify the command by changing the value to generate a new
ddl command.

> 8.
>
> + if (stmt->action->is_grant)
> + grant = new_objtree("GRANT");
> + else
> + grant = new_objtree("REVOKE");
> +
> + /* add the GRANT OPTION clause for REVOKE subcommand */
> + if (!stmt->action->is_grant)
> + {
> + tmp = new_objtree_VA("GRANT OPTION FOR",
> + 1, "present", ObjTypeBool,
> + stmt->action->grant_option);
> + append_object_object(grant, "%{grant_option}s", tmp);
> + }
>
> That 2nd 'if' can just be combined with the 'else' logic of the prior if.

Modified

> 9.
>
> + Assert(priv->cols == NIL);
> + privs = lappend(privs,
> + new_string_object(priv->priv_name));
>
> Unnecessary wrapping.

Modified

> 10. deparse_AlterTableStmt
>
> Maybe this function name should be different because it is not only
> for TABLEs but also serves for INDEX, VIEW, TYPE, etc

Changed the function name

> 11.
>
> AFAICT every case in the switch (subcmd->subtype) is doing subcmds =
> lappend(subcmds, new_object_object(tmpobj));
>
> Just doing this in common code at the end might be an easy way to
> remove ~50 lines of duplicate code.

There are fewer cases where we don't do anything:
case AT_ReAddIndex:
case AT_ReAddConstraint:
case AT_ReAddComment:
case AT_ReplaceRelOptions:
case AT_CheckNotNull:
case AT_ReAddStatistics:
/* Subtypes used for internal operations; nothing to do here */
break;

case AT_AddColumnToView:
/* CREATE OR REPLACE VIEW -- nothing to do here */
break;

This lappend cannot be moved to end as it is not applicable for all the cases.

> 12. deparse_ColumnDef
>
> + * NOT NULL constraints in the column definition are emitted directly in the
> + * column definition by this routine; other constraints must be emitted
> + * elsewhere (the info in the parse node is incomplete anyway.).
> + */
> +static ObjTree *
> +deparse_ColumnDef(Relation relation, List *dpcontext, bool composite,
> + ColumnDef *coldef, bool is_alter, List **exprs)
>
> "anyway.)." -> "anyway)."

Modified

> 13.
>
> + /* USING clause */
> + tmpobj = new_objtree("COMPRESSION");
> + if (coldef->compression)
> + append_string_object(tmpobj, "%{compression_method}I", coldef->compression);
> + else
> + {
> + append_null_object(tmpobj, "%{compression_method}I");
> + append_bool_object(tmpobj, "present", false);
> + }
>
> Why is it necessary to specify a NULL compression method if the entire
> "COMPRESSION" is anyway flagged as present=false?

This code was intended to generate a verbose json node. So that user
can easily modify the command by changing the value to generate a new
ddl command.

> 14.
>
> + foreach(cell, coldef->constraints)
> + {
> + Constraint *constr = (Constraint *) lfirst(cell);
> +
> + if (constr->contype == CONSTR_NOTNULL)
> + saw_notnull = true;
> + }
>
> Why not break immediately from this loop the first time you find
> 'saw_notnull' true?

Modified

> 15.
>
> + tmpobj = new_objtree("DEFAULT");
> + if (attrForm->atthasdef)
> + {
> + char *defstr;
> +
> + defstr = RelationGetColumnDefault(relation, attrForm->attnum,
> + dpcontext, exprs);
> +
> + append_string_object(tmpobj, "%{default}s", defstr);
> + }
> + else
> + append_bool_object(tmpobj, "present", false);
> + append_object_object(column, "%{default}s", tmpobj);
>
> Something seems a bit strange here. It looks like there are formats
> called "%{default}s" at 2 levels in this tree, so will it cause a
> hierarchy of objects with the same name?

Yes it will have contents something like:
"default": {"fmt": "DEFAULT %{default}s", "default": "11"}}}],

I don't not find any issues even though it has same name, it is able
to replicate the statement without any issue

> 16. deparse_ColumnIdentity
>
> + column = new_objtree("");
> +
> + if (!OidIsValid(seqrelid))
> + {
> + append_bool_object(column, "present", false);
> + return column;
> + }
>
> I don't really understand the point of making empty tree structures
> for not "present" elements. IIUC this is just going to make the tree
> bigger for no reason and all these not "present" branches will be
> ultimately thrown away, right? I guess the justification is that it
> might be for debugging/documentation but that does not really stand up
> in this case because it seems like just a nameless tree here.

Modified

> 17. deparse_CreateDomain
>
> + createDomain = new_objtree("CREATE");
> +
> + append_object_object(createDomain,
> + "DOMAIN %{identity}D AS",
> + new_objtree_for_qualname_id(TypeRelationId,
> + objectId));
> + append_object_object(createDomain,
> + "%{type}T",
> + new_objtree_for_type(typForm->typbasetype, typForm->typtypmod));
> +
> + if (typForm->typnotnull)
> + append_string_object(createDomain, "%{not_null}s", "NOT NULL");
> + else
> + append_string_object(createDomain, "%{not_null}s", "");
>
> 17a.
> I don't understand why this is not just a single _VA() call instead of
> spread over multiple append_objects like this.

Modified

> 17b.
> In other places, something like the "%{not_null}s" is done with a
> ternary operator instead of the excessive if/else.

Modified

> 18. deparse_CreateFunction
>
> + if (isnull)
> + probin = NULL;
> + else
> + {
> + probin = TextDatumGetCString(tmpdatum);
> + if (probin[0] == '\0' || strcmp(probin, "-") == 0)
> + probin = NULL;
> + }
>
> Maybe it is simpler to assign prbin = NULL where it is declared, then
> here you only need to test the !isnull case.

Modified

> 19.
>
> + append_string_object(createFunc, "%{or_replace}s",
> + node->replace ? "OR REPLACE" : "");
>
> It is not clear to me what is the point of such code - I mean if
> node->replace is false why do append at all? ... Why not use
> appen_format_string() instead()?
> My guess is that this way is preferred to simplify the calling code,
> but knowing that a "" value will just do nothing anyway - seems an
> overcomplicated way to do it though.

We generally use this along with new_objtree_VA, in this case it was
not used along with new_objtree_VA. I have changed it to
new_objtree_VA so that it can be combined

> 20.
>
> + typarray = palloc(list_length(node->parameters) * sizeof(Oid));
> + if (list_length(node->parameters) > procForm->pronargs)
> + {
> + Datum alltypes;
> + Datum *values;
> + bool *nulls;
> + int nelems;
> +
> + alltypes = SysCacheGetAttr(PROCOID, procTup,
> + Anum_pg_proc_proallargtypes, &isnull);
> + if (isnull)
> + elog(ERROR, "NULL proallargtypes, but more parameters than args");
> + deconstruct_array(DatumGetArrayTypeP(alltypes),
> + OIDOID, 4, 't', 'i',
> + &values, &nulls, &nelems);
> + if (nelems != list_length(node->parameters))
> + elog(ERROR, "mismatched proallargatypes");
> + for (i = 0; i < list_length(node->parameters); i++)
> + typarray[i] = values[i];
> + }
> + else
> + {
> + for (i = 0; i < list_length(node->parameters); i++)
> + typarray[i] = procForm->proargtypes.values[i];
> + }
>
> The list_length(node->parameters) is used multiple times here; it
> might have been cleaner code to assign that to some local variable.

Modified

> 21.
>
> + * Note that %{name}s is a string here, not an identifier; the reason
> + * for this is that an absent parameter name must produce an empty
> + * string, not "", which is what would happen if we were to use
> + * %{name}I here. So we add another level of indirection to allow us
> + * to inject a "present" parameter.
> + */
>
> The above comment says:
> must produce an empty string, not ""
>
> I didn't get the point - what is the difference between an empty string and ""?

if we specify as %{variable}s and if variable is specified as "", the
append_XXX functions will add a " " while deparsing json to string in
the subscriber side. It is not intended to add " " in this case.
I have changed it to below which is better and being followed
similarly in other places too:
if (param->name)
append_string_object(name, "%{name}I", param->name);
else
{
append_null_object(name, "%{name}I");
append_bool_object(name, "present", false);
}

append_object_object(paramobj, "%{name}s", name);

I have removed the comment which is confusing and changed it to above
which is common way that is used in these scenarios

> 22.
>
> + append_string_object(paramobj, "%{mode}s",
> + param->mode == FUNC_PARAM_IN ? "IN" :
> + param->mode == FUNC_PARAM_OUT ? "OUT" :
> + param->mode == FUNC_PARAM_INOUT ? "INOUT" :
> + param->mode == FUNC_PARAM_VARIADIC ? "VARIADIC" :
> + "IN");
>
> There doesn't seem to be much point to test for param->mode ==
> FUNC_PARAM_IN here since "IN" is the default mode anyway.

Modified

> 23.
>
> + name = new_objtree("");
> + append_string_object(name, "%{name}I",
> + param->name ? param->name : "NULL");
> +
> + append_bool_object(name, "present",
> + param->name ? true : false);
>
> IIUC it is uncommon to inject a "present" object if it was "true", so
> why do it like that here?

Modified to keep it similar as we do in other places

> 24.
>
> + append_format_string(tmpobj, "(");
> + append_array_object(tmpobj, "%{arguments:, }s", params);
> + append_format_string(tmpobj, ")");
>
> Is it necessary to do that in 3 lines? IIUC it would be the same if
> the parens were just included in the append_array_object format,
> right?

There is a possibility that arguments is null, if it is null then
append_array_object will not add "(" and ")". That is the reason we
need to keep it separate.

> 25.
>
> + if (procForm->prosupport)
> + {
> + Oid argtypes[1];
> +
> + /*
> + * We should qualify the support function's name if it wouldn't be
> + * resolved by lookup in the current search path.
> + */
> + argtypes[0] = INTERNALOID;
>
> Might as well just declare this as:
>
> Oid argtypes[] = { INTERNALOID };

Modified

> 26. deparse_CreateOpClassStmt
>
> +
> + stmt = new_objtree_VA("CREATE OPERATOR CLASS %{identity}D", 1,
> + "identity", ObjTypeObject,
> + new_objtree_for_qualname(opcForm->opcnamespace,
> + NameStr(opcForm->opcname)));
> +
> + /* Add the DEFAULT clause */
> + append_string_object(stmt, "%{default}s",
> + opcForm->opcdefault ? "DEFAULT" : "");
> +
> + /* Add the FOR TYPE clause */
> + append_object_object(stmt, "FOR TYPE %{type}T",
> + new_objtree_for_type(opcForm->opcintype, -1));
> +
> + /* Add the USING clause */
> + append_string_object(stmt, "USING %{amname}I",
> get_am_name(opcForm->opcmethod));
>
> This can all be done just as a single VA call I think.

Modified

> 27.
>
> + append_format_string(tmpobj, "(");
> + append_array_object(tmpobj, "%{argtypes:, }T", arglist);
> + append_format_string(tmpobj, ")");
>
> AFAIK this can just be done by a single call including the parens in
> the format string of appen_array_object.

There is a possibility that arguments is null, if it is null then
append_array_object will not add "(" and ")". That is the reason we
need to keep it separate.

> 28. deparse_CreatePolicyStmt
>
> +
> +static ObjTree *
> +deparse_CreatePolicyStmt(Oid objectId, Node *parsetree)
>
> Missing function comment.

Modified

> 29.
>
> + /* Add the rest of the stuff */
> + add_policy_clauses(policy, objectId, node->roles, !!node->qual,
> + !!node->with_check);
>
> The !! to cast the pointer parameter to boolean is cute, but IIUC that
> is not commonly used in the PG source. Maybe it is more conventional
> to just pass node->qual != NULL etc?

Modified

> 30. deparse_AlterPolicyStmt
>
> +
> +static ObjTree *
> +deparse_AlterPolicyStmt(Oid objectId, Node *parsetree)
>
> Missing function comment.

Modified

> 31.
>
> + /* Add the rest of the stuff */
> + add_policy_clauses(policy, objectId, node->roles, !!node->qual,
> + !!node->with_check);
>
> The !! to cast the pointer parameter to boolean is cute, but IIUC that
> technique is not commonly used in the PG source. Maybe it is more
> conventional to just pass node->qual != NULL etc?

Modifed

> 32. deparse_CreateSchemaStmt
>
> + else
> + {
> + append_null_object(auth, "%{authorization_role}I ");
> + append_bool_object(auth, "present", false);
> + }
>
> 32a.
> Why append a NULL object if the "present" says it is false anyway?

This code was intended to generate a verbose json node. So that user
can easily modify the command by changing the value to generate a new
ddl command.

> 32b.
> "%{authorization_role}I " -- why do they have extra space on the end?
> Just let the append_XXX functions can take care of the space
> separators automagically instead.

Modified

> 33. deparse_AlterDomainStmt
>
> + {
> + fmt = "ALTER DOMAIN";
> + type = "drop default";
> + alterDom = new_objtree_VA(fmt, 1, "type", ObjTypeString, type);
>
> This code style of assigning the 'fmt' and 'type' like this is not
> typical of all the other deparse_XXX functions which just pass
> parameter literals. Also, I see no good reason that the 'fmt' is
> unconditionally assigned to "ALTER DOMAIN" in 6 different places.

Removed it

> 34.
>
> AFAICT all these cases can be simplified to use single VA() calls and
> remove all the append_XXX.

Modified

> 35.
>
> +
> + break;
> + case 'N':
>
> Spurious or misplaced blank line.

Modified

> 36.
>
> + case 'C':
> +
> + /*
> + * ADD CONSTRAINT. Only CHECK constraints are supported by
> + * domains
> + */
>
> A spurious blank line is inconsistent with the other cases.

Modified

> 36.
>
> +
> + break;
> + default:
>
> Spurious or misplaced blank line.

Modified

> 37. deparse_CreateStatisticsStmt
>
> + append_format_string(createStat, "FROM");
> +
> + append_object_object(createStat, "%{stat_table_identity}D",
> + new_objtree_for_qualname(get_rel_namespace(statform->stxrelid),
> + get_rel_name(statform->stxrelid)));
>
> It would be easier to do things like this using a single call using a
> format of "FROM %{stat_table_identity}D", rather than have the extra
> append_format_string call.

Modified

> 38. deparse_CreateForeignServerStmt
>
> + /* Add a TYPE clause, if any */
> + tmp = new_objtree_VA("TYPE", 0);
> + if (node->servertype)
> + append_string_object(tmp, "%{type}L", node->servertype);
> + else
> + append_bool_object(tmp, "present", false);
> + append_object_object(createServer, "%{type}s", tmp);
> +
> + /* Add a VERSION clause, if any */
> + tmp = new_objtree_VA("VERSION", 0);
>
> Why use the VA() function if passing 0 args?

Modified

> 39.
>
> + append_string_object(createServer, "FOREIGN DATA WRAPPER %{fdw}I",
> node->fdwname);
> + /* add an OPTIONS clause, if any */
> + append_object_object(createServer, "%{generic_options}s",
> + deparse_FdwOptions(node->options, NULL));
>
> 39a.
> Use uppercase comment.

Modified

> 39b.
> Missing blank line above comment?

Modified

> 40. deparse_AlterForeignServerStmt
>
> + /* Add a VERSION clause, if any */
> + tmp = new_objtree_VA("VERSION", 0);
>
> Why use the VA() function if passing 0 args?

This was already fixed

> 41.
>
> + /* Add a VERSION clause, if any */
> + tmp = new_objtree_VA("VERSION", 0);
> + if (node->has_version && node->version)
> + append_string_object(tmp, "%{version}L", node->version);
> + else if (node->has_version)
> + append_string_object(tmp, "version", "NULL");
> + else
> + append_bool_object(tmp, "present", false);
> + append_object_object(alterServer, "%{version}s", tmp);
> +
> + /* Add a VERSION clause, if any */
> + tmp = new_objtree_VA("VERSION", 0);
> + if (node->has_version && node->version)
> + append_string_object(tmp, "%{version}L", node->version);
> + else if (node->has_version)
> + append_string_object(tmp, "version", "NULL");
> + else
> + append_bool_object(tmp, "present", false);
> + append_object_object(alterServer, "%{version}s", tmp);
>
> Huh? Looks like a cut/paste error of duplicate VERSION clauses. Is this correct?

Removed the duplicate code

> 42. deparse_CreateStmt
>
> + if (tableelts == NIL)
> + {
> + tmpobj = new_objtree("");
> + append_bool_object(tmpobj, "present", false);
> + }
> + else
> + tmpobj = new_objtree_VA("(%{elements:, }s)", 1,
> + "elements", ObjTypeArray, tableelts);
>
> This fragment seemed a bit complicated. IIUC this is the same as just:
>
> tmpobj = new_objtree("");
> if (tableelts)
> append_array_object(tmpobj, "(%{elements:, }s)", tableelts);
> else
> append_bool_object(tmpobj, "present", false);

Modified

> 43.
>
> + tmpobj = new_objtree("INHERITS");
> + if (list_length(node->inhRelations) > 0)
> + append_array_object(tmpobj, "(%{parents:, }D)",
> deparse_InhRelations(objectId));
> + else
> + {
> + append_null_object(tmpobj, "(%{parents:, }D)");
> + append_bool_object(tmpobj, "present", false);
> + }
> + append_object_object(createStmt, "%{inherits}s", tmpobj);
>
> 43a.
> AFAIK convention for checking non-empty List is just "if
> (node->inhRelations != NIL)" or simply "if (node->inhRelations)

Modified

> 43b.
> Maybe I misunderstand something but I don't see why append_null_object
> is needed for tree marked as "present"=false anyhow. This similar
> pattern happens multiple times in this function.

This code was intended to generate a verbose json node. So that user
can easily modify the command by changing the value to generate a new
ddl command.

> 44. deparse_DefineStmt
>
> + switch (define->kind)
> + {
>
> IMO better to put all these OBJECT_XXX cases in alphabetical order
> instead of just random.

Modified

> 45.
>
> + default:
> + elog(ERROR, "unsupported object kind");
> + }
>
> Should this also log what the define->kind was attempted?

Modified

> 46. deparse_DefineStmt_Collation
>
> + stmt = new_objtree_VA("CREATE COLLATION", 0);
> +
> + append_object_object(stmt, "%{identity}D",
> + new_objtree_for_qualname(colForm->collnamespace,
> + NameStr(colForm->collname)));
>
> Why not combine there to avoid VA args with 0 and use VA args with 1 instead?

Modified

> 47.
>
> + if (fromCollid.objectId != InvalidOid)
>
> Use OisIsValid macro.

Modified

> 48.
>
> + append_object_object(stmt, "FROM %{from_identity}D",
> + new_objtree_for_qualname(fromColForm->collnamespace,
> + NameStr(fromColForm->collname)));
> +
> +
> + ReleaseSysCache(tp);
> + ReleaseSysCache(colTup);
> + return stmt;
>
> Extra blank line.

Modified

> 49.
>
> + if (!isnull)
> + {
> + tmp = new_objtree_VA("LOCALE=", 1,
> + "clause", ObjTypeString, "locale");
> + append_string_object(tmp, "%{locale}L",
> + psprintf("%s", TextDatumGetCString(datum)));
>
> IMO it should be easy enough to express this using a single VA(2 args)
> function, so avoiding the extra append_string. e.g. other functions
> like deparse_DefineStmt_Operator do this.
>
> And this same comment also applies to the rest of this function:
> - tmp = new_objtree_VA("LC_COLLATE=", 1,
> - tmp = new_objtree_VA("LC_CTYPE=", 1,
> - tmp = new_objtree_VA("PROVIDER=", 1,
> - tmp = new_objtree_VA("PROVIDER=", 1,
> - tmp = new_objtree_VA("DETERMINISTIC=", 1,
> - tmp = new_objtree_VA("VERSION=", 1,

Modified

Thanks for the comments, the attached v40 patch has the changes for the same.

Regards,
Vignesh

Attachment	Content-Type	Size
v40-0004-Test-cases-for-DDL-replication.patch	text/x-patch	24.6 KB
v40-0001-Functions-to-deparse-DDL-commands.patch	text/x-patch	316.8 KB
v40-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	text/x-patch	15.0 KB
v40-0002-Support-DDL-replication.patch	text/x-patch	133.5 KB

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-11-25 11:00:00
Message-ID:	CALDaNm0phmj_Nf6wxpx+aw5TWoFU34zFbV1OqupPd5KeFswVFA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, 25 Nov 2022 at 11:47, vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Sun, 20 Nov 2022 at 09:29, vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > On Fri, 11 Nov 2022 at 11:03, Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > >
> > > On Fri, Nov 11, 2022 at 4:17 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > > >
> > > > On Fri, Nov 11, 2022 at 4:09 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Fri, Nov 11, 2022 at 3:47 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > > > > >
> > > > > > Here are more review comments for the v32-0001 file ddl_deparse.c
> > > > > >
> > > > > > *** NOTE - my review post became too big, so I split it into smaller parts.
> > > > >
> > > >
> > >
> > > THIS IS PART 4 OF 4.
> > >
> > > =======
> > >
> > > src/backend/commands/ddl_deparse.c
> >
> > Thanks for the comments, the attached v39 patch has the changes for the same.
>
> One comment:
> While fixing review comments, I found that default syntax is not
> handled for create domain:
> + /*
> + * Verbose syntax
> + *
> + * CREATE DOMAIN %{identity}D AS %{type}T %{not_null}s %{constraints}s
> + * %{collation}s
> + */
> + createDomain = new_objtree("CREATE");
> +
> + append_object_object(createDomain,
> + "DOMAIN %{identity}D AS",
> +
> new_objtree_for_qualname_id(TypeRelationId,
> +
> objectId));
> + append_object_object(createDomain,
> + "%{type}T",
> +
> new_objtree_for_type(typForm->typbasetype, typForm->typtypmod));

I have fixed this issue in the v40 patch attached at [1].
[1] - https://www.postgresql.org/message-id/CALDaNm1WEnw2Oykb90PO1c4oDAVrAR%2B16W8Cm_F-KzgNvqmmKg%40mail.gmail.com

Regards,
Vignesh

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	li jie <ggysxcq(at)gmail(dot)com>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-11-25 22:23:09
Message-ID:	CAAD30UJnjQkU+3Kbfg1P=-k9uf7o2N3pPB34uu-OXyN3+o_f-g@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Hello,

Thanks for the feedback.

> I have been following this patch for a long time.
> Recently, I started to try to test it. I found several bugs
> here and want to give you feedback.
>
> 1. CREATE TABLE LIKE
> I found that this case may be repication incorrectly.
> You can run the following SQL statement:
> ```
> CREATE TABLE ctlt1 (a text CHECK (length(a) > 2) PRIMARY KEY, b text);
> ALTER TABLE ctlt1 ALTER COLUMN a SET STORAGE MAIN;
> ALTER TABLE ctlt1 ALTER COLUMN b SET STORAGE EXTERNAL;
> CREATE TABLE ctlt1_like (LIKE ctlt1 INCLUDING ALL);
> ```
> The ctlt1_like table will not be able to correct the replication.
> I think this is because create table like statement is captured by
> the event trigger to a create table statement and multiple alter table statements.
> There are some overlaps between them, and an error is reported when downstream replication occurs.

I looked into this case. The root cause is the statement

CREATE TABLE ctlt1_like (LIKE ctlt1 INCLUDING ALL);

is executed internally using 3 DDLs:
1. CREATE TABLE ctlt1_like (LIKE ctlt1 INCLUDING ALL); --The top level command
2. ALTER TABLE ctlt1_like ADD CONSTRAINT ctlt1_a_check CHECK
(length(a) > 2); --The first subcommand
3. CREATE UNIQUE INDEX ctlt1_like_pkey on ctlt1_like (a); --The second
subcommand that creates the primary key index

All three commands are captured by the event trigger. The first and
second command ends up getting deparsed, WAL-logged and
replayed on the subscriber. The replay of the ALTER TABLE command
causes a duplicate constraint error. The problem is that
while subcommands are captured by event triggers by default, they
don't need to be deparsed and WAL-logged for DDL replication.
To do that we can pass the isCompleteQuery variable in
ProcessUtilitySlow to EventTriggerCollectSimpleCommand() and
EventTriggerAlterTableEnd() and make this information available in
CollectedCommand so that any subcommands can be skipped.

Thoughts?
Zheng

From:	li jie <ggysxcq(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-11-28 03:27:03
Message-ID:	CAGfChW7-qMXzgVH18b1vpetXV1A8-M0g7VHPwqyRbb9YarTtRA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

>
>
> All three commands are captured by the event trigger. The first and
> second command ends up getting deparsed, WAL-logged and
> replayed on the subscriber. The replay of the ALTER TABLE command
> causes a duplicate constraint error. The problem is that
> while subcommands are captured by event triggers by default, they
> don't need to be deparsed and WAL-logged for DDL replication.
> To do that we can pass the isCompleteQuery variable in
> ProcessUtilitySlow to EventTriggerCollectSimpleCommand() and
> EventTriggerAlterTableEnd() and make this information available in
> CollectedCommand so that any subcommands can be skipped.
>

May not be able to skip any subcommands.
like ' ALTER TABLE ctlt1_like ALTER COLUMN b SET STORAGE EXTERNAL;'

It cannot be represented in the CREATE TABLE statement.

Regards, Adger

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	li jie <ggysxcq(at)gmail(dot)com>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-11-28 03:28:57
Message-ID:	CAAD30UJ25nTPiVc0RTnsVbhHSNrnoqoackf9++Na+R-QN6dRkw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, Nov 25, 2022 at 5:23 PM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
>
> Hello,
>
> Thanks for the feedback.
>
> > I have been following this patch for a long time.
> > Recently, I started to try to test it. I found several bugs
> > here and want to give you feedback.
> >
> > 1. CREATE TABLE LIKE
> > I found that this case may be repication incorrectly.
> > You can run the following SQL statement:
> > ```
> > CREATE TABLE ctlt1 (a text CHECK (length(a) > 2) PRIMARY KEY, b text);
> > ALTER TABLE ctlt1 ALTER COLUMN a SET STORAGE MAIN;
> > ALTER TABLE ctlt1 ALTER COLUMN b SET STORAGE EXTERNAL;
> > CREATE TABLE ctlt1_like (LIKE ctlt1 INCLUDING ALL);
> > ```
> > The ctlt1_like table will not be able to correct the replication.
> > I think this is because create table like statement is captured by
> > the event trigger to a create table statement and multiple alter table statements.
> > There are some overlaps between them, and an error is reported when downstream replication occurs.
>
> I looked into this case. The root cause is the statement
>
> CREATE TABLE ctlt1_like (LIKE ctlt1 INCLUDING ALL);
>
> is executed internally using 3 DDLs:
> 1. CREATE TABLE ctlt1_like (LIKE ctlt1 INCLUDING ALL); --The top level command
> 2. ALTER TABLE ctlt1_like ADD CONSTRAINT ctlt1_a_check CHECK
> (length(a) > 2); --The first subcommand
> 3. CREATE UNIQUE INDEX ctlt1_like_pkey on ctlt1_like (a); --The second
> subcommand that creates the primary key index
>
> All three commands are captured by the event trigger. The first and
> second command ends up getting deparsed, WAL-logged and
> replayed on the subscriber. The replay of the ALTER TABLE command
> causes a duplicate constraint error. The problem is that
> while subcommands are captured by event triggers by default, they
> don't need to be deparsed and WAL-logged for DDL replication.
> To do that we can pass the isCompleteQuery variable in
> ProcessUtilitySlow to EventTriggerCollectSimpleCommand() and
> EventTriggerAlterTableEnd() and make this information available in
> CollectedCommand so that any subcommands can be skipped.

Attaching the proposed fix in
v40-0005-Do-not-generate-WAL-log-for-non-top-level-DDL-comman.patch.
This patch adds a new boolean field isTopLevelCommand to
CollectedCommand so that non-top level command
can be skipped in the DDL replication event trigger functions. The
patch also makes the information available by
passing the isTopLevel variable in ProcessUtilitySlow to several
EventTriggerCollect functions such as
EventTriggerCollectSimpleCommand and EventTriggerAlterTableStart.

> 2. ALTER TABLE (inherits)
> case:
> ```
> CREATE TABLE gtest30 (
> a int,
> b int GENERATED ALWAYS AS (a * 2) STORED
> );
> CREATE TABLE gtest30_1 () INHERITS (gtest30);
> ALTER TABLE gtest30 ALTER COLUMN b DROP EXPRESSION;
> ```
> After this case is executed in the publication, the following error occurs in the subscription :
>
> ERROR: column "b" of relation "gtest30" is not a stored generated column
> STATEMENT: ALTER TABLE public.gtest30 ALTER COLUMN b DROP EXPRESSION, ALTER COLUMN b DROP EXPRESSION
>
> Obviously, the column modifications of the inherited table were also captured,

Yes, I can confirm that the column modifications of the inherited
table were also captured as a subcommand of "ALTER TABLE gtest30 ALTER
COLUMN b DROP EXPRESSION;". This feels wrong to me, because the
subcommand
On Fri, Nov 25, 2022 at 5:23 PM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
>
> Hello,
>
> Thanks for the feedback.
>
> > I have been following this patch for a long time.
> > Recently, I started to try to test it. I found several bugs
> > here and want to give you feedback.
> >
> > 1. CREATE TABLE LIKE
> > I found that this case may be repication incorrectly.
> > You can run the following SQL statement:
> > ```
> > CREATE TABLE ctlt1 (a text CHECK (length(a) > 2) PRIMARY KEY, b text);
> > ALTER TABLE ctlt1 ALTER COLUMN a SET STORAGE MAIN;
> > ALTER TABLE ctlt1 ALTER COLUMN b SET STORAGE EXTERNAL;
> > CREATE TABLE ctlt1_like (LIKE ctlt1 INCLUDING ALL);
> > ```
> > The ctlt1_like table will not be able to correct the replication.
> > I think this is because create table like statement is captured by
> > the event trigger to a create table statement and multiple alter table statements.
> > There are some overlaps between them, and an error is reported when downstream replication occurs.
>
> I looked into this case. The root cause is the statement
>
> CREATE TABLE ctlt1_like (LIKE ctlt1 INCLUDING ALL);
>
> is executed internally using 3 DDLs:
> 1. CREATE TABLE ctlt1_like (LIKE ctlt1 INCLUDING ALL); --The top level command
> 2. ALTER TABLE ctlt1_like ADD CONSTRAINT ctlt1_a_check CHECK
> (length(a) > 2); --The first subcommand
> 3. CREATE UNIQUE INDEX ctlt1_like_pkey on ctlt1_like (a); --The second
> subcommand that creates the primary key index
>
> All three commands are captured by the event trigger. The first and
> second command ends up getting deparsed, WAL-logged and
> replayed on the subscriber. The replay of the ALTER TABLE command
> causes a duplicate constraint error. The problem is that
> while subcommands are captured by event triggers by default, they
> don't need to be deparsed and WAL-logged for DDL replication.
> To do that we can pass the isCompleteQuery variable in
> ProcessUtilitySlow to EventTriggerCollectSimpleCommand() and
> EventTriggerAlterTableEnd() and make this information available in
> CollectedCommand so that any subcommands can be skipped.

Yes, I can confirm that the column modifications of the inherited
table gtest30_1 were also captured as a subcommand of "ALTER TABLE
gtest30 ALTER COLUMN b DROP EXPRESSION;". This feels wrong to me,
because the subcommand "ALTER COLUMN b DROP EXPRESSION" is collected
for ALTER TABLE gtest30 but it's actually meant for the inherited
table gtest30_1. I think we should fix the capture of the subcommand
in a way that we know it's meant to be executed on the inherited table
gtest30_1.

Regards,
Zheng

Attachment	Content-Type	Size
v40-0002-Support-DDL-replication.patch	application/octet-stream	133.5 KB
v40-0004-Test-cases-for-DDL-replication.patch	application/octet-stream	24.6 KB
v40-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	application/octet-stream	15.0 KB
v40-0005-Do-not-generate-WAL-log-for-non-top-level-DDL-comman.patch	application/octet-stream	29.2 KB
v40-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	316.8 KB

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	li jie <ggysxcq(at)gmail(dot)com>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-11-28 04:47:39
Message-ID:	CAAD30U+d+pwsGEaNv_NT9SH4bfgFT6_zSrkv3qjLFT9_qNx5cA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

> > >
> > > 1. CREATE TABLE LIKE
> > > I found that this case may be repication incorrectly.
> > > You can run the following SQL statement:
> > > ```
> > > CREATE TABLE ctlt1 (a text CHECK (length(a) > 2) PRIMARY KEY, b text);
> > > ALTER TABLE ctlt1 ALTER COLUMN a SET STORAGE MAIN;
> > > ALTER TABLE ctlt1 ALTER COLUMN b SET STORAGE EXTERNAL;
> > > CREATE TABLE ctlt1_like (LIKE ctlt1 INCLUDING ALL);
> > > ```
> > > The ctlt1_like table will not be able to correct the replication.
> > > I think this is because create table like statement is captured by
> > > the event trigger to a create table statement and multiple alter table statements.
> > > There are some overlaps between them, and an error is reported when downstream replication occurs.
> >
> > I looked into this case. The root cause is the statement
> >
> > CREATE TABLE ctlt1_like (LIKE ctlt1 INCLUDING ALL);
> >
> > is executed internally using 3 DDLs:
> > 1. CREATE TABLE ctlt1_like (LIKE ctlt1 INCLUDING ALL); --The top level command
> > 2. ALTER TABLE ctlt1_like ADD CONSTRAINT ctlt1_a_check CHECK
> > (length(a) > 2); --The first subcommand
> > 3. CREATE UNIQUE INDEX ctlt1_like_pkey on ctlt1_like (a); --The second
> > subcommand that creates the primary key index
> >
> > All three commands are captured by the event trigger. The first and
> > second command ends up getting deparsed, WAL-logged and
> > replayed on the subscriber. The replay of the ALTER TABLE command
> > causes a duplicate constraint error. The problem is that
> > while subcommands are captured by event triggers by default, they
> > don't need to be deparsed and WAL-logged for DDL replication.
> > To do that we can pass the isCompleteQuery variable in
> > ProcessUtilitySlow to EventTriggerCollectSimpleCommand() and
> > EventTriggerAlterTableEnd() and make this information available in
> > CollectedCommand so that any subcommands can be skipped.
>
> Attaching the proposed fix in
> v40-0005-Do-not-generate-WAL-log-for-non-top-level-DDL-comman.patch.
> This patch adds a new boolean field isTopLevelCommand to
> CollectedCommand so that non-top level command
> can be skipped in the DDL replication event trigger functions. The
> patch also makes the information available by
> passing the isTopLevel variable in ProcessUtilitySlow to several
> EventTriggerCollect functions such as
> EventTriggerCollectSimpleCommand and EventTriggerAlterTableStart.

Patch v40-0005-Do-not-generate-WAL-log-for-non-top-level-DDL-comman.patch
broke the following test case:

CREATE TABLE product (id int PRIMARY KEY, name text);
CREATE TABLE orders (order_id int PRIMARY KEY, product_id int
REFERENCES product (id));

Because forein key constraint was not deparsed as part of CREATE TABLE
but rather replicated as a non-top-level command (which we
no longer replicate in patch v40-0005), fixed this in the attached patch:

diff --git a/src/backend/commands/ddl_deparse.c
b/src/backend/commands/ddl_deparse.c
index 89f33d451c..d9bb3aab8b 100755
--- a/src/backend/commands/ddl_deparse.c
+++ b/src/backend/commands/ddl_deparse.c
@@ -1087,8 +1087,6 @@ objtree_to_jsonb_rec(ObjTree *tree,
JsonbParseState *state)
* the given elements list. The updated list is returned.
*
* This works for typed tables, regular tables, and domains.
- *
- * Note that CONSTRAINT_FOREIGN constraints are always ignored.
*/
static List *
obtainConstraints(List *elements, Oid relationId, Oid domainId)
@@ -1146,7 +1144,8 @@ obtainConstraints(List *elements, Oid
relationId, Oid domainId)
contype = "check";
break;
case CONSTRAINT_FOREIGN:
- continue; /* not here */
+ contype = "foreign key";
+ break;
case CONSTRAINT_PRIMARY:
contype = "primary key";
break;

Regards,
Zheng

Attachment	Content-Type	Size
v40-0002-Support-DDL-replication.patch	application/octet-stream	133.5 KB
v40-0005-Do-not-generate-WAL-log-for-non-top-level-DDL-comman.patch	application/octet-stream	30.1 KB
v40-0004-Test-cases-for-DDL-replication.patch	application/octet-stream	24.6 KB
v40-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	316.8 KB
v40-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	application/octet-stream	15.0 KB

From:	rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	li jie <ggysxcq(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-11-29 02:29:03
Message-ID:	CADgiWi7x0SXEXAtKdMHAfnp5ZGC_ghQtoVcV3jf_ZgfArbDqMA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

One question,

I understand that we create/enable triggers on create publication command flow.
I am wondering how this works in case of logical replication using slots.

thanks
Rajesh

On Mon, Nov 28, 2022 at 10:17 AM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
>
> > > >
> > > > 1. CREATE TABLE LIKE
> > > > I found that this case may be repication incorrectly.
> > > > You can run the following SQL statement:
> > > > ```
> > > > CREATE TABLE ctlt1 (a text CHECK (length(a) > 2) PRIMARY KEY, b text);
> > > > ALTER TABLE ctlt1 ALTER COLUMN a SET STORAGE MAIN;
> > > > ALTER TABLE ctlt1 ALTER COLUMN b SET STORAGE EXTERNAL;
> > > > CREATE TABLE ctlt1_like (LIKE ctlt1 INCLUDING ALL);
> > > > ```
> > > > The ctlt1_like table will not be able to correct the replication.
> > > > I think this is because create table like statement is captured by
> > > > the event trigger to a create table statement and multiple alter table statements.
> > > > There are some overlaps between them, and an error is reported when downstream replication occurs.
> > >
> > > I looked into this case. The root cause is the statement
> > >
> > > CREATE TABLE ctlt1_like (LIKE ctlt1 INCLUDING ALL);
> > >
> > > is executed internally using 3 DDLs:
> > > 1. CREATE TABLE ctlt1_like (LIKE ctlt1 INCLUDING ALL); --The top level command
> > > 2. ALTER TABLE ctlt1_like ADD CONSTRAINT ctlt1_a_check CHECK
> > > (length(a) > 2); --The first subcommand
> > > 3. CREATE UNIQUE INDEX ctlt1_like_pkey on ctlt1_like (a); --The second
> > > subcommand that creates the primary key index
> > >
> > > All three commands are captured by the event trigger. The first and
> > > second command ends up getting deparsed, WAL-logged and
> > > replayed on the subscriber. The replay of the ALTER TABLE command
> > > causes a duplicate constraint error. The problem is that
> > > while subcommands are captured by event triggers by default, they
> > > don't need to be deparsed and WAL-logged for DDL replication.
> > > To do that we can pass the isCompleteQuery variable in
> > > ProcessUtilitySlow to EventTriggerCollectSimpleCommand() and
> > > EventTriggerAlterTableEnd() and make this information available in
> > > CollectedCommand so that any subcommands can be skipped.
> >
> > Attaching the proposed fix in
> > v40-0005-Do-not-generate-WAL-log-for-non-top-level-DDL-comman.patch.
> > This patch adds a new boolean field isTopLevelCommand to
> > CollectedCommand so that non-top level command
> > can be skipped in the DDL replication event trigger functions. The
> > patch also makes the information available by
> > passing the isTopLevel variable in ProcessUtilitySlow to several
> > EventTriggerCollect functions such as
> > EventTriggerCollectSimpleCommand and EventTriggerAlterTableStart.
>
> Patch v40-0005-Do-not-generate-WAL-log-for-non-top-level-DDL-comman.patch
> broke the following test case:
>
> CREATE TABLE product (id int PRIMARY KEY, name text);
> CREATE TABLE orders (order_id int PRIMARY KEY, product_id int
> REFERENCES product (id));
>
> Because forein key constraint was not deparsed as part of CREATE TABLE
> but rather replicated as a non-top-level command (which we
> no longer replicate in patch v40-0005), fixed this in the attached patch:
>
> diff --git a/src/backend/commands/ddl_deparse.c
> b/src/backend/commands/ddl_deparse.c
> index 89f33d451c..d9bb3aab8b 100755
> --- a/src/backend/commands/ddl_deparse.c
> +++ b/src/backend/commands/ddl_deparse.c
> @@ -1087,8 +1087,6 @@ objtree_to_jsonb_rec(ObjTree *tree,
> JsonbParseState *state)
> * the given elements list. The updated list is returned.
> *
> * This works for typed tables, regular tables, and domains.
> - *
> - * Note that CONSTRAINT_FOREIGN constraints are always ignored.
> */
> static List *
> obtainConstraints(List *elements, Oid relationId, Oid domainId)
> @@ -1146,7 +1144,8 @@ obtainConstraints(List *elements, Oid
> relationId, Oid domainId)
> contype = "check";
> break;
> case CONSTRAINT_FOREIGN:
> - continue; /* not here */
> + contype = "foreign key";
> + break;
> case CONSTRAINT_PRIMARY:
> contype = "primary key";
> break;
>
> Regards,
> Zheng

From:	Ajin Cherian <itsajin(at)gmail(dot)com>
To:	rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-11-29 04:25:35
Message-ID:	CAFPTHDbunLJ8MA5gJ9Ym89EFAHfW7=nSOi42k9jQG+Z4ZgT1Fg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Tue, Nov 29, 2022 at 1:29 PM rajesh singarapu
<rajesh(dot)rs0541(at)gmail(dot)com> wrote:
>
> One question,
>
> I understand that we create/enable triggers on create publication command flow.
> I am wondering how this works in case of logical replication using slots.
>
>
> thanks
> Rajesh
>
Rajesh,

The triggers functions when invoked write these ddl commands to WAL
and the logical decoding WAL sender which is
registered for that replication slot decodes the WAL logged DDL
commands and sends them as logical replication
messages to the subscriber side. The apply worker on the subscriber
side, then converts these messages to actual
DDL commands and executes them.

regards,
Ajin Cherian
Fujitsu Australia

From:	rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>
To:	Ajin Cherian <itsajin(at)gmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-11-29 04:39:33
Message-ID:	CADgiWi49b_NYZOYZ8xBQM0JV887_Vta5q53cHF5ZArzNf9hd4w@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Thanks Ajin for the reply.

We "Create/Install" these trigger function at the time of "Create
publication", (CreatePublication())
but If I create a replication slot using something like "select * from
pg_create_logical_replication_slot('test1', 'test_decoding')"
we would not install these triggers in the system, so we dont get DDLs
decoded, right ?

I am a bit new to this postgres, is there anything missing in my understanding ?

thanks
Raejsh

On Tue, Nov 29, 2022 at 9:55 AM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
> On Tue, Nov 29, 2022 at 1:29 PM rajesh singarapu
> <rajesh(dot)rs0541(at)gmail(dot)com> wrote:
> >
> > One question,
> >
> > I understand that we create/enable triggers on create publication command flow.
> > I am wondering how this works in case of logical replication using slots.
> >
> >
> > thanks
> > Rajesh
> >
> Rajesh,
>
> The triggers functions when invoked write these ddl commands to WAL
> and the logical decoding WAL sender which is
> registered for that replication slot decodes the WAL logged DDL
> commands and sends them as logical replication
> messages to the subscriber side. The apply worker on the subscriber
> side, then converts these messages to actual
> DDL commands and executes them.
>
> regards,
> Ajin Cherian
> Fujitsu Australia

From:	Ajin Cherian <itsajin(at)gmail(dot)com>
To:	rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-11-29 04:44:42
Message-ID:	CAFPTHDZFn-zoH+fWqPqniY6GiHiqHqK=4P4tGBYXDB317zRg1A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Tue, Nov 29, 2022 at 3:39 PM rajesh singarapu
<rajesh(dot)rs0541(at)gmail(dot)com> wrote:
>
> Thanks Ajin for the reply.
>
> We "Create/Install" these trigger function at the time of "Create
> publication", (CreatePublication())
> but If I create a replication slot using something like "select * from
> pg_create_logical_replication_slot('test1', 'test_decoding')"
> we would not install these triggers in the system, so we dont get DDLs
> decoded, right ?
>
> I am a bit new to this postgres, is there anything missing in my understanding ?
>
> thanks
> Raejsh
>

Currently this feature is only supported using "Create publication".
We have not added
a slot level parameter to trigger this.

regards,
Ajin Cherian
Fujitsu Australia

From:	rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>
To:	Ajin Cherian <itsajin(at)gmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-11-29 05:05:32
Message-ID:	CADgiWi69u=-9XFa98gjL-76b0_esgio2LVfzVsj9WyJr0yKY7w@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Thanks Ajin for the clarification.

This is really a helpful feature.

On Tue, Nov 29, 2022 at 10:14 AM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
> On Tue, Nov 29, 2022 at 3:39 PM rajesh singarapu
> <rajesh(dot)rs0541(at)gmail(dot)com> wrote:
> >
> > Thanks Ajin for the reply.
> >
> > We "Create/Install" these trigger function at the time of "Create
> > publication", (CreatePublication())
> > but If I create a replication slot using something like "select * from
> > pg_create_logical_replication_slot('test1', 'test_decoding')"
> > we would not install these triggers in the system, so we dont get DDLs
> > decoded, right ?
> >
> > I am a bit new to this postgres, is there anything missing in my understanding ?
> >
> > thanks
> > Raejsh
> >
>
> Currently this feature is only supported using "Create publication".
> We have not added
> a slot level parameter to trigger this.
>
> regards,
> Ajin Cherian
> Fujitsu Australia

From:	rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>
To:	Ajin Cherian <itsajin(at)gmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-11-29 05:22:34
Message-ID:	CADgiWi7KXmGGZE9FigoWHhiCecyLf618kBq9brEVUF3kKZ=REQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Isn't it a good idea to move triggers to CreateReplicationSlot() ? as
create publication also create replication slot, AFAIK.

thanks
Rajesh

On Tue, Nov 29, 2022 at 10:35 AM rajesh singarapu
<rajesh(dot)rs0541(at)gmail(dot)com> wrote:
>
> Thanks Ajin for the clarification.
>
> This is really a helpful feature.
>
> On Tue, Nov 29, 2022 at 10:14 AM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
> >
> > On Tue, Nov 29, 2022 at 3:39 PM rajesh singarapu
> > <rajesh(dot)rs0541(at)gmail(dot)com> wrote:
> > >
> > > Thanks Ajin for the reply.
> > >
> > > We "Create/Install" these trigger function at the time of "Create
> > > publication", (CreatePublication())
> > > but If I create a replication slot using something like "select * from
> > > pg_create_logical_replication_slot('test1', 'test_decoding')"
> > > we would not install these triggers in the system, so we dont get DDLs
> > > decoded, right ?
> > >
> > > I am a bit new to this postgres, is there anything missing in my understanding ?
> > >
> > > thanks
> > > Raejsh
> > >
> >
> > Currently this feature is only supported using "Create publication".
> > We have not added
> > a slot level parameter to trigger this.
> >
> > regards,
> > Ajin Cherian
> > Fujitsu Australia

From:	Ajin Cherian <itsajin(at)gmail(dot)com>
To:	rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-11-29 06:39:29
Message-ID:	CAFPTHDZnCfkHotvkxnDEn9w7g4nTwbhgc9DyJZam3u56Dx+t+A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Tue, Nov 29, 2022 at 4:22 PM rajesh singarapu
<rajesh(dot)rs0541(at)gmail(dot)com> wrote:
>
> Isn't it a good idea to move triggers to CreateReplicationSlot() ? as
> create publication also create replication slot, AFAIK.
>
> thanks
> Rajesh
>
Currently we're trying to get this work using "Create Publication",
maybe in future
we'll consider adding it as part of replication slot parameters.

regards,
Ajin Cherian
Fujitsu Australia

From:	li jie <ggysxcq(at)gmail(dot)com>
To:	Ajin Cherian <itsajin(at)gmail(dot)com>
Cc:	rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-11-29 12:21:07
Message-ID:	CAGfChW4zpk9LaMav+nLazgjTLTBfD-XXr-M1H2C0bE0pQgR_fQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

I will continue to give feedback for this patch.

1. LIKE STORAGE
```
CREATE TABLE ctlt (a text, c text);
ALTER TABLE ctlt ALTER COLUMN c SET STORAGE EXTERNAL;
CREATE TABLE ctlt_storage (LIKE ctlt INCLUDING STORAGE);
```

postgres=# \d+ ctlt_storage

Table "public.ctlt_storage"

--------+------+-----------+----------+---------+----------+-------------+--------------+-------------

It can be seen that the storage attribute in column C of table
ctlt_storage is not replicated.

After the CREATE TABLE LIKE statement is converted,
the LIKE STORAGE attribute is lost because it is difficult to display
it in the CREATE TABLE syntax.
Maybe we need to add a statement to it, like 'ALTER TABLE ctlt_storage
ALTER COLUMN c SET STORAGE EXTERNAL;'.

2. Reference subcommand be dropped.
```
create table another (f1 int, f2 text, f3 text);

alter table another
alter f1 type text using f2 || ' and ' || f3 || ' more',
alter f2 type bigint using f1 * 10,
drop column f3;
```

The following error occurs downstream:
ERROR: column "?dropped?column?" does not exist at character 206
STATEMENT: ALTER TABLE public.another DROP COLUMN f3 , ALTER COLUMN
f1 SET DATA TYPE pg_catalog.text COLLATE pg_catalog."default" USING
(((f2 OPERATOR(pg_catalog.||) ' and '::pg_catalog.text)
OPERATOR(pg_catalog.||) "?dropped?column?") OPERATOR(pg_catalog.||) '
more'::pg_catalog.text), ALTER COLUMN f2 SET DATA TYPE pg_catalog.int8
USING (f1 OPERATOR(pg_catalog.*) 10)

Obviously, column f3 has been deleted and its name no longer exists.
Maybe we need to keep it and save it in advance like a drop object.
However, ATLER TABLE is complex, and this problem also occurs in
other similar scenarios.

Thoughts? Adger.

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	li jie <ggysxcq(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-11-29 14:35:13
Message-ID:	CALDaNm170kTzHShPgJ+Ta4Tjf3pma4npEvRVe8yeF5TRLS8QhA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Tue, 29 Nov 2022 at 17:51, li jie <ggysxcq(at)gmail(dot)com> wrote:
>
> I will continue to give feedback for this patch.

Thanks a lot, that will be very helpful for us.

> 1. LIKE STORAGE
> ```
> CREATE TABLE ctlt (a text, c text);
> ALTER TABLE ctlt ALTER COLUMN c SET STORAGE EXTERNAL;
> CREATE TABLE ctlt_storage (LIKE ctlt INCLUDING STORAGE);
> ```
>
> postgres=# \d+ ctlt_storage
>
> Table "public.ctlt_storage"
>
> Column | Type | Collation | Nullable | Default | Storage |
> Compression | Stats target | Description
>
> --------+------+-----------+----------+---------+----------+-------------+--------------+-------------
>
> a | text | | | | extended |
> | |
>
> c | text | | | | extended |
> | |
>
>
> It can be seen that the storage attribute in column C of table
> ctlt_storage is not replicated.
>
> After the CREATE TABLE LIKE statement is converted,
> the LIKE STORAGE attribute is lost because it is difficult to display
> it in the CREATE TABLE syntax.
> Maybe we need to add a statement to it, like 'ALTER TABLE ctlt_storage
> ALTER COLUMN c SET STORAGE EXTERNAL;'.
>
> 2. Reference subcommand be dropped.
> ```
> create table another (f1 int, f2 text, f3 text);
>
> alter table another
> alter f1 type text using f2 || ' and ' || f3 || ' more',
> alter f2 type bigint using f1 * 10,
> drop column f3;
> ```
>
> The following error occurs downstream:
> ERROR: column "?dropped?column?" does not exist at character 206
> STATEMENT: ALTER TABLE public.another DROP COLUMN f3 , ALTER COLUMN
> f1 SET DATA TYPE pg_catalog.text COLLATE pg_catalog."default" USING
> (((f2 OPERATOR(pg_catalog.||) ' and '::pg_catalog.text)
> OPERATOR(pg_catalog.||) "?dropped?column?") OPERATOR(pg_catalog.||) '
> more'::pg_catalog.text), ALTER COLUMN f2 SET DATA TYPE pg_catalog.int8
> USING (f1 OPERATOR(pg_catalog.*) 10)
>
> Obviously, column f3 has been deleted and its name no longer exists.
> Maybe we need to keep it and save it in advance like a drop object.
> However, ATLER TABLE is complex, and this problem also occurs in
> other similar scenarios.

I will analyze these issues and post a patch to handle it.

Regards,
Vignesh

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	li jie <ggysxcq(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-11-29 18:31:49
Message-ID:	CALDaNm09Cxu-fUtNmAWmEerbcqJmS_ktVzDz39m=qfLYqYqH-A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

This is fixed with the attached patch.

> 2. Reference subcommand be dropped.
> ```
> create table another (f1 int, f2 text, f3 text);
>
> alter table another
> alter f1 type text using f2 || ' and ' || f3 || ' more',
> alter f2 type bigint using f1 * 10,
> drop column f3;
> ```
>
> The following error occurs downstream:
> ERROR: column "?dropped?column?" does not exist at character 206
> STATEMENT: ALTER TABLE public.another DROP COLUMN f3 , ALTER COLUMN
> f1 SET DATA TYPE pg_catalog.text COLLATE pg_catalog."default" USING
> (((f2 OPERATOR(pg_catalog.||) ' and '::pg_catalog.text)
> OPERATOR(pg_catalog.||) "?dropped?column?") OPERATOR(pg_catalog.||) '
> more'::pg_catalog.text), ALTER COLUMN f2 SET DATA TYPE pg_catalog.int8
> USING (f1 OPERATOR(pg_catalog.*) 10)
>
> Obviously, column f3 has been deleted and its name no longer exists.
> Maybe we need to keep it and save it in advance like a drop object.
> However, ATLER TABLE is complex, and this problem also occurs in
> other similar scenarios.

This is slightly tricky, we will fix this in the next version.

Also a couple of other issues reported are fixed in this patch:
3. ALTER TABLE SET STATISTICS
CREATE TABLE test_stat (a int);
ALTER TABLE test_stat ALTER a SET STATISTICS -1;

4. json null string coredump
CREATE OR REPLACE FUNCTION test_ddl_deparse_full()
RETURNS event_trigger LANGUAGE plpgsql AS
$$
DECLARE
r record;
deparsed_json text;
BEGIN
FOR r IN SELECT * FROM pg_event_trigger_ddl_commands()
LOOP
deparsed_json = ddl_deparse_to_json(r.command);
RAISE NOTICE 'deparsed json: %', deparsed_json;
RAISE NOTICE 're-formed command: %', ddl_deparse_expand_command(deparsed_json);
END LOOP;
END;
$$;

CREATE EVENT TRIGGER test_ddl_deparse_full
ON ddl_command_end EXECUTE PROCEDURE test_ddl_deparse_full();

CREATE SCHEMA AUTHORIZATION postgres;

The attached v41 patch has the fixes for the above 3 issues.

Regards,
Vignesh

Attachment	Content-Type	Size
v41-0002-Support-DDL-replication.patch	text/x-patch	133.5 KB
v41-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	text/x-patch	15.0 KB
v41-0004-Test-cases-for-DDL-replication.patch	text/x-patch	24.6 KB
v41-0001-Functions-to-deparse-DDL-commands.patch	text/x-patch	317.2 KB
v41-0005-Do-not-generate-WAL-log-for-non-top-level-DDL-co.patch	text/x-patch	30.1 KB

From:	li jie <ggysxcq(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-12-01 13:03:02
Message-ID:	CAGfChW4w-Nbag8js_QmZQMqDBZ7Sgz=28BhQV0oka5VDyJhwWw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

I applied patch 0005.

I think this modification is a bit overdone.
This design skips all subcommands, which results in many ddl
replication failures.
For example:
```
CREATE TABLE datatype_table (id SERIAL);
```
deparsed ddl is:
CREATE TABLE public.datatype_table (id pg_catalog.int4 STORAGE plain
NOT NULL DEFAULT
pg_catalog.nextval('public.datatype_table_id_seq'::pg_catalog.regclass))
CREATE SEQUENCE subcommand will be skipped.

OR:
```
CREATE SCHEMA element_test
CREATE TABLE foo (id int)
CREATE VIEW bar AS SELECT * FROM foo;
```
deparsed ddl is:
CREATE SCHEMA element_test.

Its subcommands will be skipped.
There may be other cases.

For the initial CREATE LIKE statement, It is special,
It derives the subcommand of alter table column.
Just skipping them may be enough.
Instead of skipping subcommands of all statements.
After all, our design is to obtain the actual ddl information from the
catalog instead of parsing raw parsetree.
This is why we cannot skip all subcommands.

Do you have any better ideas?

Regards, Adger.

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	li jie <ggysxcq(at)gmail(dot)com>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-12-02 02:46:25
Message-ID:	CAAD30UJ+8rZnGs+w9pmXjTwjPOxFvrcbVGjejy4cL4+oBmQmAw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

> I applied patch 0005.
>
> I think this modification is a bit overdone.
> This design skips all subcommands, which results in many ddl
> replication failures.
> For example:
> ```
> CREATE TABLE datatype_table (id SERIAL);
> ```
> deparsed ddl is:
> CREATE TABLE public.datatype_table (id pg_catalog.int4 STORAGE plain
> NOT NULL DEFAULT
> pg_catalog.nextval('public.datatype_table_id_seq'::pg_catalog.regclass))
> CREATE SEQUENCE subcommand will be skipped.
>
> OR:
> ```
> CREATE SCHEMA element_test
> CREATE TABLE foo (id int)
> CREATE VIEW bar AS SELECT * FROM foo;
> ```
> deparsed ddl is:
> CREATE SCHEMA element_test.
>
> Its subcommands will be skipped.
> There may be other cases.
>
> For the initial CREATE LIKE statement, It is special,
> It derives the subcommand of alter table column.
> Just skipping them may be enough.
> Instead of skipping subcommands of all statements.
> After all, our design is to obtain the actual ddl information from the
> catalog instead of parsing raw parsetree.
> This is why we cannot skip all subcommands.

Agreed, event triggers capture commands on the subcommand level so
we can't skip subcommands at large without modifying how to
deparse top level commands such as CREATE TABLE/INDEX/SCHEMA.
Also for special statements like CREATE TABLE datatype_table (id SERIAL);
the CREATE SEQUENCE subcommand actually has to execute before
the top level command CREATE TABLE.

Attached please find a new solution that skips the deparsing of ALTER TABLE
subcommands generated for TableLikeClause. The patch v42-0005 added a new
boolean field table_like to AlterTableStmt in order to identify an ALTER TABLE
subcommand generated internally for the TableLikeClause.

Regards,
Zheng

Attachment	Content-Type	Size
v42-0005-Skip-ALTER-TABLE-subcommands-generated-for-TableLike.patch	application/octet-stream	2.2 KB
v42-0004-Test-cases-for-DDL-replication.patch	application/octet-stream	24.6 KB
v42-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	application/octet-stream	15.0 KB
v42-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	317.2 KB
v42-0002-Support-DDL-replication.patch	application/octet-stream	133.5 KB

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	li jie <ggysxcq(at)gmail(dot)com>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-12-02 23:48:59
Message-ID:	CAAD30UJw=4s-BezskPH6XieigAVBdsO9dCGQddzwC7z_7MxUaQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

This is fixed in the attached v43 patch set. The fix is to skip the
deparse of the subcommand if the objectId of the subcommand doesn't
match the objectId of the parent/wrapper ALTER TABLE command.

Regards,
Zheng

Attachment	Content-Type	Size
v43-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	application/octet-stream	15.0 KB
v43-0004-Test-cases-for-DDL-replication.patch	application/octet-stream	24.6 KB
v43-0002-Support-DDL-replication.patch	application/octet-stream	133.5 KB
v43-0005-Skip-ALTER-TABLE-subcommands-generated-for-TableLike.patch	application/octet-stream	2.2 KB
v43-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	317.9 KB

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	li jie <ggysxcq(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-12-06 18:52:29
Message-ID:	CALDaNm3w2hmCt+61PKQiGXvyn9dQYO8OoLmNehOwHzotCLfNug@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Tue, 29 Nov 2022 at 17:51, li jie <ggysxcq(at)gmail(dot)com> wrote:
>
> I will continue to give feedback for this patch.
>
> 1. LIKE STORAGE
> ```
> CREATE TABLE ctlt (a text, c text);
> ALTER TABLE ctlt ALTER COLUMN c SET STORAGE EXTERNAL;
> CREATE TABLE ctlt_storage (LIKE ctlt INCLUDING STORAGE);
> ```
>
> postgres=# \d+ ctlt_storage
>
> Table "public.ctlt_storage"
>
> Column | Type | Collation | Nullable | Default | Storage |
> Compression | Stats target | Description
>
> --------+------+-----------+----------+---------+----------+-------------+--------------+-------------
>
> a | text | | | | extended |
> | |
>
> c | text | | | | extended |
> | |
>
>
> It can be seen that the storage attribute in column C of table
> ctlt_storage is not replicated.
>
> After the CREATE TABLE LIKE statement is converted,
> the LIKE STORAGE attribute is lost because it is difficult to display
> it in the CREATE TABLE syntax.
> Maybe we need to add a statement to it, like 'ALTER TABLE ctlt_storage
> ALTER COLUMN c SET STORAGE EXTERNAL;'.
>
> 2. Reference subcommand be dropped.
> ```
> create table another (f1 int, f2 text, f3 text);
>
> alter table another
> alter f1 type text using f2 || ' and ' || f3 || ' more',
> alter f2 type bigint using f1 * 10,
> drop column f3;
> ```
>
> The following error occurs downstream:
> ERROR: column "?dropped?column?" does not exist at character 206
> STATEMENT: ALTER TABLE public.another DROP COLUMN f3 , ALTER COLUMN
> f1 SET DATA TYPE pg_catalog.text COLLATE pg_catalog."default" USING
> (((f2 OPERATOR(pg_catalog.||) ' and '::pg_catalog.text)
> OPERATOR(pg_catalog.||) "?dropped?column?") OPERATOR(pg_catalog.||) '
> more'::pg_catalog.text), ALTER COLUMN f2 SET DATA TYPE pg_catalog.int8
> USING (f1 OPERATOR(pg_catalog.*) 10)
>
> Obviously, column f3 has been deleted and its name no longer exists.
> Maybe we need to keep it and save it in advance like a drop object.
> However, ATLER TABLE is complex, and this problem also occurs in
> other similar scenarios.

Currently the event trigger is called after the execution of the
command, i.e. after the drop column f3, since the column is dropped we
cannot evaluate the expression using pg_get_expr for a dropped column.
I fixed this issue
by adding an event trigger before preparation of the ALTER TYPE sub
command and the event trigger will collect the expression before the
actual execution and deparse the command later after execution. Since
the evaluated expression is stored already, there will be no need to
get the expression after execution of alter sub commands.
The attached v44 version patch has the changes for the same. Thoughts?

Regards,
Vignesh

Attachment	Content-Type	Size
v44-0002-Support-DDL-replication.patch	text/x-patch	133.5 KB
v44-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	text/x-patch	15.0 KB
v44-0004-Test-cases-for-DDL-replication.patch	text/x-patch	24.6 KB
v44-0005-Skip-ALTER-TABLE-subcommands-generated-for-Table.patch	text/x-patch	2.2 KB
v44-0001-Functions-to-deparse-DDL-commands.patch	text/x-patch	317.7 KB
v44-0006-Support-DDL-replication-of-alter-type-having-USI.patch	text/x-patch	9.0 KB

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-12-07 11:22:19
Message-ID:	CALDaNm0Oi50cocMsd5kdBQyFK1wqXOqRQjrQ6_FcGd7BCCHtUA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Wed, 2 Nov 2022 at 05:13, vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Mon, 31 Oct 2022 at 16:17, vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > On Thu, 27 Oct 2022 at 16:02, vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > >
> > > On Thu, 27 Oct 2022 at 02:09, Zheng Li <zhengli10(at)gmail(dot)com> wrote:
> > > >
> > > > > Adding support for deparsing of CREATE/ALTER/DROP LANGUAGE for ddl replication.
> > > >
> > > > Adding support for deparsing of:
> > > > COMMENT
> > > > ALTER DEFAULT PRIVILEGES
> > > > CREATE/DROP ACCESS METHOD
> > >
> > > Adding support for deparsing of:
> > > ALTER/DROP ROUTINE
> > >
> > > The patch also includes fixes for the following issues:
> >
>
> Few comments:
> 1) Empty () should be appended in case if there are no table elements:
> + tableelts = deparse_TableElements(relation,
> node->tableElts, dpcontext,
> +
> false, /* not typed table */
> +
> false); /* not composite */
> + tableelts = obtainConstraints(tableelts, objectId, InvalidOid);
> +
> + append_array_object(createStmt, "(%{table_elements:,
> }s)", tableelts);
>
> This is required for:
> CREATE TABLE ihighway () INHERITS (road);
>
> 2)
> 2.a)
> Here cell2 will be of type RoleSpec, the below should be changed:
> + foreach(cell2, (List *) opt->arg)
> + {
> + String *val = lfirst_node(String, cell2);
> + ObjTree *obj =
> new_objtree_for_role(strVal(val));
> +
> + roles = lappend(roles, new_object_object(obj));
> + }
>
> to:
> foreach(cell2, (List *) opt->arg)
> {
> RoleSpec *rolespec = lfirst(cell2);
> ObjTree *obj = new_objtree_for_rolespec(rolespec);
>
> roles = lappend(roles, new_object_object(obj));
> }
>
> This change is required for:
> ALTER DEFAULT PRIVILEGES FOR ROLE regress_selinto_user REVOKE INSERT
> ON TABLES FROM regress_selinto_user;
>
> 2.b) After the above change the following function cna be removed:
> +/*
> + * Helper routine for %{}R objects, with role specified by name.
> + */
> +static ObjTree *
> +new_objtree_for_role(char *rolename)
> +{
> + ObjTree *role;
> +
> + role = new_objtree_VA(NULL,2,
> + "is_public",
> ObjTypeBool, strcmp(rolename, "public") == 0,
> + "rolename",
> ObjTypeString, rolename);
> + return role;
> +}
>
> 3) There was a crash in this materialized view scenario:
> + /* add the query */
> + Assert(IsA(node->query, Query));
> + append_string_object(createStmt, "AS %{query}s",
> +
> pg_get_querydef((Query *) node->query, false));
> +
> + /* add a WITH NO DATA clause */
> + tmp = new_objtree_VA("WITH NO DATA", 1,
> + "present", ObjTypeBool,
> + node->into->skipData
> ? true : false);
>
> CREATE TABLE mvtest_t (id int NOT NULL PRIMARY KEY, type text NOT
> NULL, amt numeric NOT NULL);
> CREATE VIEW mvtest_tv AS SELECT type, sum(amt) AS totamt FROM mvtest_t
> GROUP BY type;
> CREATE VIEW mvtest_tvv AS SELECT sum(totamt) AS grandtot FROM mvtest_tv;
> CREATE MATERIALIZED VIEW mvtest_tvvm AS SELECT * FROM mvtest_tvv;
> CREATE VIEW mvtest_tvvmv AS SELECT * FROM mvtest_tvvm;
> CREATE MATERIALIZED VIEW mvtest_bb AS SELECT * FROM mvtest_tvvmv;
>
> #0 0x0000560d45637897 in AcquireRewriteLocks (parsetree=0x0,
> forExecute=false, forUpdatePushedDown=false) at rewriteHandler.c:154
> #1 0x0000560d45637b93 in AcquireRewriteLocks
> (parsetree=0x560d467c4778, forExecute=false,
> forUpdatePushedDown=false) at rewriteHandler.c:269
> #2 0x0000560d457f792a in get_query_def (query=0x560d467c4778,
> buf=0x7ffeb8059bd0, parentnamespace=0x0, resultDesc=0x0,
> colNamesVisible=true, prettyFlags=2, wrapColumn=0, startIndent=0) at
> ruleutils.c:5566
> #3 0x0000560d457ee869 in pg_get_querydef (query=0x560d467c4778,
> pretty=false) at ruleutils.c:1639
> #4 0x0000560d453437f6 in deparse_CreateTableAsStmt_vanilla
> (objectId=24591, parsetree=0x560d467c4748) at ddl_deparse.c:7076
> #5 0x0000560d45348864 in deparse_simple_command (cmd=0x560d467c3b98)
> at ddl_deparse.c:9158
> #6 0x0000560d45348b75 in deparse_utility_command (cmd=0x560d467c3b98,
> verbose_mode=false) at ddl_deparse.c:9273
> #7 0x0000560d45351627 in publication_deparse_ddl_command_end
> (fcinfo=0x7ffeb8059e90) at event_trigger.c:2517
> #8 0x0000560d4534eeb1 in EventTriggerInvoke
> (fn_oid_list=0x560d467b5450, trigdata=0x7ffeb8059ef0) at
> event_trigger.c:1082
> #9 0x0000560d4534e61c in EventTriggerDDLCommandEnd
> (parsetree=0x560d466e8a88) at event_trigger.c:732
> #10 0x0000560d456b6ee2 in ProcessUtilitySlow (pstate=0x560d467cdee8,
> pstmt=0x560d466e9a18, queryString=0x560d466e7c38 "CREATE MATERIALIZED
> VIEW mvtest_bb AS SELECT * FROM mvtest_tvvmv;",
> context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
> dest=0x560d467cb5d8, qc=0x7ffeb805a6f0) at utility.c:1926

In case of a materialized view, if there is a possibility to optimize
the subquery, the tree will be changed accordingly. We will not be
able to get the query definition using this tree as the tree has been
changed and some of the nodes will be deleted. I have modified it so
that we make a copy of the tree before the actual execution (before
the tree is getting changed). The attached v45 patch has the changes
for the same.

Regards,
Vignesh

Attachment	Content-Type	Size
v45-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	text/x-patch	15.7 KB
v45-0004-Test-cases-for-DDL-replication.patch	text/x-patch	24.6 KB
v45-0005-Skip-ALTER-TABLE-subcommands-generated-for-Table.patch	text/x-patch	2.2 KB
v45-0002-Support-DDL-replication.patch	text/x-patch	133.5 KB
v45-0001-Functions-to-deparse-DDL-commands.patch	text/x-patch	317.7 KB
v45-0006-Support-DDL-replication-of-alter-type-having-USI.patch	text/x-patch	9.0 KB

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-12-07 12:20:41
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

I think this patch is split badly.

You have:

0001 an enormous patch including some required infrastructure, plus the
DDL deparsing bits themselves.

0002 another enormous (though not as much) patch, this time for
DDL replication using the above.

0003 a bugfix for 0001, which includes changes in both the
infrastructure and the deparsing bits.

0004 test stuff for 0002.

0005 Another bugfix for 0001

0006 Another bugfix for 0001

As presented, I think it has very little chance of being reviewed
usefully. A better way to go about this, I think, would be:

0001 - infrastructure bits to support the DDL deparsing parts (all these
new functions in ruleutils.c, sequence.c, etc). That means, everything
(?) that's currently in your 0001 except ddl_deparse.c and friends.
Clearly there are several independent changes here; maybe it is possible
to break it down even further. This patch or these patches should also
include the parts of 0003, 0005, 0006 that require changes outside of
ddl_deparse.c.
I expect that this patch should be fairly small.

0002 - ddl_deparse.c and its very close friends. This should not have
any impact on places such as ruleutils.c, sequence.c, etc. The parts of
the bugfixes (0001, 0005, 0006) that touch this could should be merged
here as well; there's no reason to have them as separate patches. Some
test code should be here also, though it probably doesn't need to aim to
be complete.
This one is likely to be very large, but also self-contained.

0003 - ddlmessage.c and friends. I understand that DDL-messaging is
supporting infrastructure for DDL replication; I think it should be its
own patch. Probably include its own simple-ish test bits.
Not a very large patch.

0004 - DDL replication proper, including 0004.
Probably not a very large patch either, not sure.

Some review comments, just skimming:
- 0002 adds some functions to event_trigger.c, but that doesn't seem to
be their place. Maybe some new file in src/backend/replication/logical
would make more sense.

- publication_deparse_ddl_command_end has a long strcmp() list; why?
Maybe change things so that it compares some object type enum instead.

- CreatePublication has a long list of command tags; is that good?
Maybe it'd be better to annotate the list in cmdtaglist.h somehow.

- The change in pg_dump's getPublications needs updated to 16.

- Don't "git add" src/bin/pg_waldump/logicalddlmsgdesc.c, just update
its Makefile and meson.build

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"Las cosas son buenas o malas segun las hace nuestra opinión" (Lisias)

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-12-08 05:07:15
Message-ID:	CALDaNm2vOYJKi5jQFc3VJZ2CYGDEMr+hcJ4L8j0Kw=1KbvSHcQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Wed, 7 Dec 2022 at 17:50, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>
> I think this patch is split badly.
>
> You have:
>
> 0001 an enormous patch including some required infrastructure, plus the
> DDL deparsing bits themselves.
>
> 0002 another enormous (though not as much) patch, this time for
> DDL replication using the above.
>
> 0003 a bugfix for 0001, which includes changes in both the
> infrastructure and the deparsing bits.
>
> 0004 test stuff for 0002.
>
> 0005 Another bugfix for 0001
>
> 0006 Another bugfix for 0001
>
> As presented, I think it has very little chance of being reviewed
> usefully. A better way to go about this, I think, would be:
>
> 0001 - infrastructure bits to support the DDL deparsing parts (all these
> new functions in ruleutils.c, sequence.c, etc). That means, everything
> (?) that's currently in your 0001 except ddl_deparse.c and friends.
> Clearly there are several independent changes here; maybe it is possible
> to break it down even further. This patch or these patches should also
> include the parts of 0003, 0005, 0006 that require changes outside of
> ddl_deparse.c.
> I expect that this patch should be fairly small.
>
> 0002 - ddl_deparse.c and its very close friends. This should not have
> any impact on places such as ruleutils.c, sequence.c, etc. The parts of
> the bugfixes (0001, 0005, 0006) that touch this could should be merged
> here as well; there's no reason to have them as separate patches. Some
> test code should be here also, though it probably doesn't need to aim to
> be complete.
> This one is likely to be very large, but also self-contained.
>
> 0003 - ddlmessage.c and friends. I understand that DDL-messaging is
> supporting infrastructure for DDL replication; I think it should be its
> own patch. Probably include its own simple-ish test bits.
> Not a very large patch.
>
> 0004 - DDL replication proper, including 0004.
> Probably not a very large patch either, not sure.
>
>
> Some review comments, just skimming:
> - 0002 adds some functions to event_trigger.c, but that doesn't seem to
> be their place. Maybe some new file in src/backend/replication/logical
> would make more sense.
>
> - publication_deparse_ddl_command_end has a long strcmp() list; why?
> Maybe change things so that it compares some object type enum instead.
>
> - CreatePublication has a long list of command tags; is that good?
> Maybe it'd be better to annotate the list in cmdtaglist.h somehow.
>
> - The change in pg_dump's getPublications needs updated to 16.
>
> - Don't "git add" src/bin/pg_waldump/logicalddlmsgdesc.c, just update
> its Makefile and meson.build
>
> - I think psql's \dRp should not have the new column at the end.
> Maybe one of:
> + Name | Owner | DDL | All tables | Inserts | Updates | Deletes | Truncates | Via root
> + Name | Owner | All tables | DDL | Inserts | Updates | Deletes | Truncates | Via root
> + Name | Owner | All tables | Inserts | Updates | Deletes | Truncates | DDL | Via root
> (I would not add the "s" at the end of that column title, also).

Thanks for the comments, these comments will make the patch reviewing easier.
There are a couple of review comments [1] and [2] which are spread
across the code, it will be difficult to handle this after
restructuring of the patch as the comments are spread across the code
in the patch. So we will handle [1] and [2] first and then work on
restructuring work suggested by you.

[1] - https://www.postgresql.org/message-id/CAHut%2BPsERMFwO8oK3LFH_3CRG%2B512T%2Bay_viWzrgNetbH2MwxA%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/CAHut%2BPuxo_kq2toicNK_BQdeccK3REGW-Xv8tVauFvTNku6V-w%40mail.gmail.com

Regards,
Vignesh

From:	li jie <ggysxcq(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-12-09 03:27:33
Message-ID:	CAGfChW7H0L6JMU5pc0hGcMgWZiGyCUiZKgxhXO0J-4zBaMSuUw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

It is worth considering whether temporary objects, such as tables,
indexes, and sequences,
should be replicated to the subscriber side.

Like temp tables, different sessions create their own temp tables.
If they are all transferred to the subscriber side, there will
inevitably be errors,
because there is only one subscription session.

I think temporary objects should not be part of replication because
they are visible within the session.
replicate them over would not make them visible to the user and would
not be meaningful.

Here is a case:
```
create temp table t1(id int);
\c
create temp table t1(id int);
```

Thoughts?
li jie

vignesh C <vignesh21(at)gmail(dot)com> 于2022年12月8日周四 13:07写道：
>
> On Wed, 7 Dec 2022 at 17:50, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> >
> > I think this patch is split badly.
> >
> > You have:
> >
> > 0001 an enormous patch including some required infrastructure, plus the
> > DDL deparsing bits themselves.
> >
> > 0002 another enormous (though not as much) patch, this time for
> > DDL replication using the above.
> >
> > 0003 a bugfix for 0001, which includes changes in both the
> > infrastructure and the deparsing bits.
> >
> > 0004 test stuff for 0002.
> >
> > 0005 Another bugfix for 0001
> >
> > 0006 Another bugfix for 0001
> >
> > As presented, I think it has very little chance of being reviewed
> > usefully. A better way to go about this, I think, would be:
> >
> > 0001 - infrastructure bits to support the DDL deparsing parts (all these
> > new functions in ruleutils.c, sequence.c, etc). That means, everything
> > (?) that's currently in your 0001 except ddl_deparse.c and friends.
> > Clearly there are several independent changes here; maybe it is possible
> > to break it down even further. This patch or these patches should also
> > include the parts of 0003, 0005, 0006 that require changes outside of
> > ddl_deparse.c.
> > I expect that this patch should be fairly small.
> >
> > 0002 - ddl_deparse.c and its very close friends. This should not have
> > any impact on places such as ruleutils.c, sequence.c, etc. The parts of
> > the bugfixes (0001, 0005, 0006) that touch this could should be merged
> > here as well; there's no reason to have them as separate patches. Some
> > test code should be here also, though it probably doesn't need to aim to
> > be complete.
> > This one is likely to be very large, but also self-contained.
> >
> > 0003 - ddlmessage.c and friends. I understand that DDL-messaging is
> > supporting infrastructure for DDL replication; I think it should be its
> > own patch. Probably include its own simple-ish test bits.
> > Not a very large patch.
> >
> > 0004 - DDL replication proper, including 0004.
> > Probably not a very large patch either, not sure.
> >
> >
> > Some review comments, just skimming:
> > - 0002 adds some functions to event_trigger.c, but that doesn't seem to
> > be their place. Maybe some new file in src/backend/replication/logical
> > would make more sense.
> >
> > - publication_deparse_ddl_command_end has a long strcmp() list; why?
> > Maybe change things so that it compares some object type enum instead.
> >
> > - CreatePublication has a long list of command tags; is that good?
> > Maybe it'd be better to annotate the list in cmdtaglist.h somehow.
> >
> > - The change in pg_dump's getPublications needs updated to 16.
> >
> > - Don't "git add" src/bin/pg_waldump/logicalddlmsgdesc.c, just update
> > its Makefile and meson.build
> >
> > - I think psql's \dRp should not have the new column at the end.
> > Maybe one of:
> > + Name | Owner | DDL | All tables | Inserts | Updates | Deletes | Truncates | Via root
> > + Name | Owner | All tables | DDL | Inserts | Updates | Deletes | Truncates | Via root
> > + Name | Owner | All tables | Inserts | Updates | Deletes | Truncates | DDL | Via root
> > (I would not add the "s" at the end of that column title, also).
>
> Thanks for the comments, these comments will make the patch reviewing easier.
> There are a couple of review comments [1] and [2] which are spread
> across the code, it will be difficult to handle this after
> restructuring of the patch as the comments are spread across the code
> in the patch. So we will handle [1] and [2] first and then work on
> restructuring work suggested by you.
>
> [1] - https://www.postgresql.org/message-id/CAHut%2BPsERMFwO8oK3LFH_3CRG%2B512T%2Bay_viWzrgNetbH2MwxA%40mail.gmail.com
> [2] - https://www.postgresql.org/message-id/CAHut%2BPuxo_kq2toicNK_BQdeccK3REGW-Xv8tVauFvTNku6V-w%40mail.gmail.com
>
> Regards,
> Vignesh
>
>

From:	li jie <ggysxcq(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-12-09 03:31:59
Message-ID:	CAGfChW4wrY2jkw+dX=mqwgEHov7kKUWmEANQ7O9TYkzeg1SLXQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

>
> Attached please find a new solution that skips the deparsing of ALTER TABLE
> subcommands generated for TableLikeClause. The patch v42-0005 added a new
> boolean field table_like to AlterTableStmt in order to identify an ALTER TABLE
> subcommand generated internally for the TableLikeClause.
>
> Regards,
> Zheng

I took a look at this patch and it appears to be incomplete.

> @@ -1974,6 +1974,7 @@ typedef struct AlterTableStmt
> List *cmds; /* list of subcommands */
> ObjectType objtype; /* type of object */
> bool missing_ok; /* skip error if table missing */
> + bool table_like; /* internally generated for TableLikeClause */
> } AlterTableStmt;

The table_like field should include implementations of the `copynode`
and `equalnode `methods.

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	li jie <ggysxcq(at)gmail(dot)com>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Zheng Li <zhengli10(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-12-09 10:36:05
Message-ID:	CAA4eK1KPF_PFSuAzoXPFeLprkhyX0EtxeJK1+Fgq-wu-Byxprw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, Dec 9, 2022 at 8:57 AM li jie <ggysxcq(at)gmail(dot)com> wrote:
>
> It is worth considering whether temporary objects, such as tables,
> indexes, and sequences,
> should be replicated to the subscriber side.
>
> Like temp tables, different sessions create their own temp tables.
> If they are all transferred to the subscriber side, there will
> inevitably be errors,
> because there is only one subscription session.
>
> I think temporary objects should not be part of replication because
> they are visible within the session.
> replicate them over would not make them visible to the user and would
> not be meaningful.
>

I also don't think we should replicate temp tables. We anyway don't
replicate their data.

--
With Regards,
Amit Kapila.

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	li jie <ggysxcq(at)gmail(dot)com>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-12-12 04:58:27
Message-ID:	CAAD30U+A=2rjZ+xejNz+e1A=udWPQMxHD8W48nbhxwJRfw_qrA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Hi,

Attached please find the DDL deparser testing module in the v45-0007
patch, this testing module is written by Runqi Tian in [1] with minor
modification from myself. I think we can
start adding more tests to the module now that we're getting close to
finish implementing the DDL deparser.

This testing module ddl_deparse_regress aims to achieve the following
four testing goals for the DDL deparser:
1. Test that the generated JSON blob is expected using SQL tests.
2. Test that the re-formed DDL command is expected using SQL tests.
3. Test that the re-formed DDL command has the same effect as the
original command
by comparing the results of pg_dump, using the SQL tests in 1 and 2.
4. Test that any new DDL syntax is handled by the DDL deparser by
capturing and deparsing
DDL commands ran by pg_regress.

1 and 2 is tested with SQL tests, by comparing the deparsed JSON blob
and the re-formed command.
3 is tested with TAP framework in t/001_compare_dumped_results.pl
4 is tested with TAP framework and pg_regress in 002_regress_tests.pl,
the execution is currently commented out because it will fail due
unimplemented commands in the DDL deparser.

[1] https://www.postgresql.org/message-id/flat/CAH8n8_jMTunxxtP4L-3tc%3DGNamg%3Dmg1X%3DtgHr9CqqjjzFLwQng%40mail.gmail.com

On Thu, Dec 8, 2022 at 10:32 PM li jie <ggysxcq(at)gmail(dot)com> wrote:
>
> >
> > Attached please find a new solution that skips the deparsing of ALTER TABLE
> > subcommands generated for TableLikeClause. The patch v42-0005 added a new
> > boolean field table_like to AlterTableStmt in order to identify an ALTER TABLE
> > subcommand generated internally for the TableLikeClause.
> >
> > Regards,
> > Zheng
>
> I took a look at this patch and it appears to be incomplete.
>
> > @@ -1974,6 +1974,7 @@ typedef struct AlterTableStmt
> > List *cmds; /* list of subcommands */
> > ObjectType objtype; /* type of object */
> > bool missing_ok; /* skip error if table missing */
> > + bool table_like; /* internally generated for TableLikeClause */
> > } AlterTableStmt;
>
> The table_like field should include implementations of the `copynode`
> and `equalnode `methods.

Looks like this we don't need to manually update
copynode/equalnode/readnode/outnode
now that they are generated by ./src/backend/nodes/gen_node_support.pl.

> I think temporary objects should not be part of replication because
> they are visible within the session.
> replicate them over would not make them visible to the user and would
> not be meaningful.

We currently don't replicate temporary tables:
event_trigger.c
/*
* Do not generate wal log for commands whose target table is a
* temporary table.
*
* We will generate wal logs for unlogged tables so that unlogged
* tables can also be created and altered on the subscriber side. This
* makes it possible to directly replay the SET LOGGED command and the
* incoming rewrite message without creating a new table.
*/
if (relpersist == RELPERSISTENCE_TEMP)
continue;

Regards,
Zheng

Attachment	Content-Type	Size
v45-0002-Support-DDL-replication.patch	application/octet-stream	133.5 KB
v45-0005-Skip-ALTER-TABLE-subcommands-generated-for-TableLike.patch	application/octet-stream	2.2 KB
v45-0004-Test-cases-for-DDL-replication.patch	application/octet-stream	24.6 KB
v45-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	application/octet-stream	15.7 KB
v45-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	317.7 KB
v45-0007-Introduce-the-test_ddl_deparse_regress-test-module.-.patch	application/octet-stream	41.0 KB
v45-0006-Support-DDL-replication-of-alter-type-having-USING-e.patch	application/octet-stream	9.0 KB

From:	Ajin Cherian <itsajin(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-12-12 12:20:29
Message-ID:	CAFPTHDbp_opg-RcKHhvRuXiZx_xdS9Pz06QM6uZRyPPb62R7Sg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Tue, Nov 15, 2022 at 10:57 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Here are some review comments for v32-0002
>
> ======
>
> 1. Commit message
>
> Comment says:
> While creating a publication, we register a command end
> trigger that deparses the DDL as a JSON blob, and WAL logs it. The event
> trigger is automatically removed at the time of drop publication.
>
> SUGGESTION (uppercase the SQL)
> During CREATE PUBLICATION we register a command end trigger that
> deparses the DDL as a JSON blob, and WAL logs it. The event
> trigger is automatically removed at the time of DROP PUBLICATION.
>
> ~~~

fixed.

>
> 2.
>
> Comment says:
> This is a POC patch to show how using event triggers and DDL deparsing
> facilities we can implement DDL replication. So, the implementation is
> restricted to CREATE TABLE/ALTER TABLE/DROP TABLE commands.
>
> ~
>
> Still correct or old comment gone stale?
>

Removed.

> ~~~
>
> 3.
>
> Comment says:
> Note that the replication for ALTER INDEX command is still under
> progress.
>
> ~
>
> Still correct or old comment gone stale?
>

Removed.

> ======
>
> 4. GENERAL - Patch order.
>
> Somehow, I feel this v32-0002 patch and the v32-0001 patch should be
> swapped. IIUC this one seems to me to be the "core" framework for the
> DDL message replication but the other 0001 was more like just the
> implements of all the supported different *kinds* of DDL JSON blobs.
> So actually this patch seems more like the mandatory one and the other
> one can just evolve as it gets more supported JSON.
>

I think there is a big patch reordering planned in future versions
based on this comment
and Alvaro's comment. Skipping this for now.

> ~~~
>
> 5. GENERAL - naming
>
> The DDL suffix 'msg' or 'message' seemed sometimes unnecessary because
> there is no ambiguity that this is a message for DDL replication, so
> the shorter name conveys the same amount of information, doesn't it?
>
> e.g. Maybe reconsider some of these ones (probably there are others)...
>
> src/include/replication/decode.h
> logicalddlmsg_decode -> Why not call this function logicalddl_decode?
>
> src/include/replication/logicalproto.h:
> LOGICAL_REP_MSG_DDLMESSAGE -> Why not call it 'LOGICAL_REP_MSG_DDL'?
> logicalrep_write_ddlmessage -> Why not call this function logicalrep_write_ddl?
> logicalrep_read_ddlmessage -> Why not call this function logicalrep_read_ddl?
>
> src/include/replication/output_plugin.h:
> 'ddlmessage_cb' -> Why not call it 'ddl_cb'?
> 'stream_ddlmessage_cb' -> Why not call it 'stream_ddl_cb'?
>
> src/include/replication/reorderbuffer.h:
> - 'REORDER_BUFFER_CHANGE_DDL' --> Why not call it 'REORDER_BUFFER_CHANGE_DDL'?
> - 'ddlmsg' -> Why not call it 'ddl'?
> - 'ddlmessage' -> Why not call it 'ddl'?
> - 'stream_ddlmessage' -> Why not call it 'stream_ddl'?
>

Fixed.

> ======
>
> src/backend/access/rmgrdesc/Makefile
>
> 6.
>
> @@ -19,6 +19,7 @@ OBJS = \
> hashdesc.o \
> heapdesc.o \
> logicalmsgdesc.o \
> + logicalddlmsgdesc.o \
>
> Change should be in alphabetical order.
>

Fixed.

> ======
>
> src/backend/access/rmgrdesc/logicalddlmsgdesc.c
>
> 7. logicalddlmsg_identify
>
> +const char *
> +logicalddlmsg_identify(uint8 info)
> +{
> + if ((info & ~XLR_INFO_MASK) == XLOG_LOGICAL_DDL_MESSAGE)
> + return "DDL MESSAGE";
> +
> + return NULL;
> +}
>
> The logicalrep_message_type (see below) said "DDL", so maybe this
> should also just say "DDL" instead of "DDL MESSAGE"
>
> @@ -1218,6 +1264,8 @@ logicalrep_message_type(LogicalRepMsgType action)
> return "TYPE";
> case LOGICAL_REP_MSG_MESSAGE:
> return "MESSAGE";
> + case LOGICAL_REP_MSG_DDLMESSAGE:
> + return "DDL";
>

Fixed.

> ======
>
> src/backend/commands/event_trigger.c
>
> 8. start/end
>
> +/*
> + * publication_deparse_ddl_command_start
> + *
> + * Deparse the ddl command and log it.
> + */
> +Datum
> +publication_deparse_ddl_command_start(PG_FUNCTION_ARGS)
> ...
> +/*
> + * publication_deparse_ddl_command_end
> + *
> + * Deparse the ddl command and log it.
> + */
> +Datum
> +publication_deparse_ddl_command_end(PG_FUNCTION_ARGS)
>
> The start/end function comments are the same -- there should be some
> more explanation to say what they are for.
>

Updated with a more detailed explanation.

> ~~~
>
> 9. publication_deparse_ddl_command_start
>
> + char *command = psprintf("Drop table command start");
>
> Huh? So this function is only for this specific case of DROP TABLE? If
> correct, then I think that should be commented on or asserted
> somewhere.
>

Updated the comments specifying this.

> ~
>
> 10.
>
> + /* extract the relid from the parse tree */
> + foreach(cell1, stmt->objects)
>
> Uppercase comment
>

Fixed.

> ~
>
> 11.
>
> + if (relpersist == RELPERSISTENCE_TEMP)
> + {
> + table_close(relation, NoLock);
> + continue;
> + }
> +
> + LogLogicalDDLMessage("deparse", address.objectId, DCT_TableDropStart,
> + command, strlen(command) + 1);
> +
> + if (relation)
> + table_close(relation, NoLock);
>
> This code looks overly complex. Can't it just be like below?
>
> SUGGESTION
>
> if (relpersist != RELPERSISTENCE_TEMP)
> LogLogicalDDLMessage("deparse", address.objectId, DCT_TableDropStart,
> command, strlen(command) + 1);
>
> if (relation)
> table_close(relation, NoLock);
>
> ~~~
>
> 12. publication_deparse_table_rewrite
>
> + if (relpersist == RELPERSISTENCE_TEMP)
> + return PointerGetDatum(NULL);
> +
> + /* Deparse the DDL command and WAL log it to allow decoding of the same. */
> + json_string = deparse_utility_command(cmd, false);
> +
> + if (json_string != NULL)
> + LogLogicalDDLMessage("deparse", cmd->d.alterTable.objectId, DCT_TableAlter,
> + json_string, strlen(json_string) + 1);
> +
> + return PointerGetDatum(NULL);
>
> Similar to previous comment I think this can be simplified so there is
> only one return
>
> SUGGESTION
>
> if (relpersist != RELPERSISTENCE_TEMP)
> {
> /* Deparse the DDL command and WAL log it to allow decoding of the same. */
> json_string = deparse_utility_command(cmd, false);
>
> if (json_string != NULL)
> LogLogicalDDLMessage("deparse", cmd->d.alterTable.objectId, DCT_TableAlter,
> json_string, strlen(json_string) + 1);
> }
>
> return PointerGetDatum(NULL);
>

Fixed as described above.

> ~~~
>
> 13. publication_deparse_ddl_command_end
>
> + if (relpersist == RELPERSISTENCE_TEMP)
> + continue;
> +
> + /*
> + * Deparse the DDL command and WAL log it to allow decoding of the
> + * same.
> + */
> + json_string = deparse_utility_command(cmd, false);
> +
> + if (json_string == NULL)
> + continue;
> +
> + LogLogicalDDLMessage("deparse", relid, type, json_string,
> + strlen(json_string) + 1);
>
> Maybe this logic is simpler without all the continue?
>
> SUGGESTION
>
> if (relpersist != RELPERSISTENCE_TEMP)
> {
> /*
> * Deparse the DDL command and WAL log it to allow decoding of the
> * same.
> */
> json_string = deparse_utility_command(cmd, false);
>
> if (json_string != NULL)
> LogLogicalDDLMessage("deparse", relid, type, json_string,
> strlen(json_string) + 1);
> }
>

Fixed.

> ~
>
> 14.
>
> + if (strcmp(obj->objecttype, "table") == 0)
> + cmdtype = DCT_TableDropEnd;
> + else if (strcmp(obj->objecttype, "sequence") == 0 ||
> + strcmp(obj->objecttype, "schema") == 0 ||
> + strcmp(obj->objecttype, "index") == 0 ||
> + strcmp(obj->objecttype, "function") == 0 ||
> + strcmp(obj->objecttype, "procedure") == 0 ||
> + strcmp(obj->objecttype, "operator") == 0 ||
> + strcmp(obj->objecttype, "operator class") == 0 ||
> + strcmp(obj->objecttype, "operator family") == 0 ||
> + strcmp(obj->objecttype, "cast") == 0 ||
> + strcmp(obj->objecttype, "type") == 0 ||
> + strcmp(obj->objecttype, "domain") == 0 ||
> + strcmp(obj->objecttype, "trigger") == 0 ||
> + strcmp(obj->objecttype, "conversion") == 0 ||
> + strcmp(obj->objecttype, "policy") == 0 ||
> + strcmp(obj->objecttype, "rule") == 0 ||
> + strcmp(obj->objecttype, "extension") == 0 ||
> + strcmp(obj->objecttype, "foreign-data wrapper") == 0 ||
> + strcmp(obj->objecttype, "text search configuration") == 0 ||
> + strcmp(obj->objecttype, "text search dictionary") == 0 ||
> + strcmp(obj->objecttype, "text search parser") == 0 ||
> + strcmp(obj->objecttype, "text search template") == 0 ||
> + strcmp(obj->objecttype, "transform") == 0 ||
> + strcmp(obj->objecttype, "server") == 0 ||
> + strcmp(obj->objecttype, "collation") == 0 ||
> + strcmp(obj->objecttype, "user mapping") == 0 ||
> + strcmp(obj->objecttype, "language") == 0 ||
> + strcmp(obj->objecttype, "view") == 0 ||
> + strcmp(obj->objecttype, "materialized view") == 0 ||
> + strcmp(obj->objecttype, "statistics object") == 0 ||
> + strcmp(obj->objecttype, "access method") == 0)
> + cmdtype = DCT_ObjectDrop;
> + else
> + continue;
> +
> + /* Change foreign-data wrapper to foreign data wrapper */
> + if (strncmp(obj->objecttype, "foreign-data wrapper", 20) == 0)
> + {
> + tmptype = pstrdup("foreign data wrapper");
> + command = deparse_drop_command(obj->objidentity, tmptype,
> + stmt->behavior);
> + }
> +
> + /* Change statistics object to statistics */
> + else if (strncmp(obj->objecttype, "statistics object",
> + strlen("statistics object")) == 0)
> + {
> + tmptype = pstrdup("statistics");
> + command = deparse_drop_command(obj->objidentity, tmptype,
> + stmt->behavior);
> + }
> +
> + /*
> + * object identity needs to be modified to make the drop work
> + *
> + * FROM: <role> on server <servername> TO : for >role> server
> + * <servername>
> + *
> + */
> + else if (strncmp(obj->objecttype, "user mapping", 12) == 0)
> + {
> + char *on_server;
> +
> + tmptype = palloc(strlen(obj->objidentity) + 2);
> + on_server = strstr(obj->objidentity, "on server");
> +
> + sprintf((char *) tmptype, "for ");
> + strncat((char *) tmptype, obj->objidentity, on_server - obj->objidentity);
> + strcat((char *) tmptype, on_server + 3);
> + command = deparse_drop_command(tmptype, obj->objecttype,
> + stmt->behavior);
> + }
> + else
> + command = deparse_drop_command(obj->objidentity, obj->objecttype,
> + stmt->behavior);
>
> 14a.
> Why are some of these implemented as strcmp and others are implemented
> as strncmp?
>
> ~
>
> 14b.
> The mass strcmp seems inefficient. The same could be done in other ways like:
> - use a single strstr call (where all the possibilities are in one large string)
> - pass string representation of some enum and just switch on it
> - etc.
>
> ~

I considered modifying this but using one large string could possibly result
in a false positive with an overlapping combination of alphabets in
adjoining words.
This will definitely require a refactoring, but I am not sure what is
the best way.
Maybe a simple 'for' loop searching an array of strings is better.

>
> 15.
>
> + /*
> + * object identity needs to be modified to make the drop work
> + *
> + * FROM: <role> on server <servername> TO : for >role> server
> + * <servername>
> + *
> + */
>
> The comment needs fixing.
>

Fixed.

> ~
>
> 16.
>
> + if (command == NULL)
> + continue;
> +
> + LogLogicalDDLMessage("deparse", obj->address.objectId, cmdtype,
> + command, strlen(command) + 1);
>
> SUGGESTION
>
> if (command)
> LogLogicalDDLMessage("deparse", obj->address.objectId, cmdtype,
> command, strlen(command) + 1);
>

Fixed.

> ======
>
> src/backend/commands/publicationcmds.c
>
>
> 17. CreateDDLReplicaEventTrigger
>
> + static const char *trigger_name_prefix = "pg_deparse_trig_%s_%u";
> + static const char *trigger_func_prefix = "publication_deparse_%s";
>
> 17a.
> I felt the ddl deparse trigger name should have the name "ddl" in it somewhere
>
It is there in the second-half of the string;
1. ddl_command_end
2. ddl_command_start

> ~
>
> 17b.
> Why are these called "prefixes" ?? - They looked more just like name
> format strings to me.
>

The caller can decide the second half of the string as event name,
this function sets the first half, hence suffix.

> ~~~
>
> 18. CreatePublication
>
> + /*
> + * Create an event trigger to allow logging of DDL statements.
> + *
> + * TODO: We need to find a better syntax to allow replication of DDL
> + * statements.
> + *
> + * XXX: This code is just to show the replication of CREATE/ALTER/DROP
> + * TABLE works. We need to enhance this once the approach for DDL
> + * replication is finalized.
> + */
> + if (pubactions.pubddl)
>
> This comment needs updating.
>

Fixed.

> ~
>
> 19.
>
> + CommandTag end_commands[] = {
> + CMDTAG_CREATE_ACCESS_METHOD,
> + CMDTAG_DROP_ACCESS_METHOD,
> + CMDTAG_ALTER_DEFAULT_PRIVILEGES,
> + CMDTAG_COMMENT,
> + CMDTAG_CREATE_LANGUAGE,
> + CMDTAG_ALTER_LANGUAGE,
> + CMDTAG_DROP_LANGUAGE,
> + CMDTAG_CREATE_VIEW,
> + CMDTAG_ALTER_VIEW,
> + CMDTAG_DROP_VIEW,
> + CMDTAG_CREATE_MATERIALIZED_VIEW,
>
> 19a.
> Some better ordering (e.g. A-Z) can be done here, and maybe use blank
> lines to make the groupings more obbious.
>

Fixed.

> ~
>
> 19b.
> Wouldn't it be better to declare these static?
>
>

Fixed.

> ======
>
> src/backend/replication/logical/Makefile
>
> 20.
>
> OBJS = \
> decode.o \
> + ddlmessage.o\
> launcher.o \
> Change should be in alphabetical order.
>

Fixed.

> ======
>
> src/backend/replication/logical/ddlmessage.c
>
> 21. File Comment
>
> + * Unlike generic logical messages, these DDL messages have only transactional
> + * mode.Note by default DDLs in PostgreSQL are transactional.
>
> Missing space before "Note"
>

Fixed.

> ~~~
>
> 22. LogLogicalDDLMessage
>
> + /*
> + * Ensure we have a valid transaction id.
> + */
> + Assert(IsTransactionState());
> + GetCurrentTransactionId();
>
> Single line comment should be OK here
>

Fixed.

> ~
>
> 23.
>
> + /* trailing zero is critical; see logicalddlmsg_desc */
>
> Uppercase comment
>

fixed.

> ~
>
> 24.
>
> + /* allow origin filtering */
>
> Uppercase comment
>

fixed.

> ======
>
> src/backend/replication/logical/proto.c
>
> 25. logicalrep_read_ddlmessage
>
> + uint8 flags;
> + char *msg;
> +
> + //TODO double check when do we need to get TransactionId.
> +
> + flags = pq_getmsgint(in, 1);
> + if (flags != 0)
> + elog(ERROR, "unrecognized flags %u in ddl message", flags);
> + *lsn = pq_getmsgint64(in);
> + *prefix = pq_getmsgstring(in);
> + *sz = pq_getmsgint(in, 4);
> + msg = (char *) pq_getmsgbytes(in, *sz);
> +
> + return msg;
>
> 25a.
> This code will fail if the associated *write* function has sent a xid.
> Maybe additional param is needed to tell it when to read the xid?
>

removed to not send xid, not required.

> ~
>
> 25b.
> Will be tidier to have a blank line after the elog
>

fixed.

> ~~~
>
> 26. logicalrep_write_ddlmessage
>
> + /* transaction ID (if not valid, we're not streaming) */
> + if (TransactionIdIsValid(xid))
> + pq_sendint32(out, xid);
>
> Perhaps this "write" function should *always* write the xid even if it
> is invalid because then the "read" function will know to always read
> it.
>

changed it to never send xid.

> ======
>
> src/backend/replication/logical/reorderbuffer.c
>
> 27. ReorderBufferQueueDDLMessage
>
> + Assert(xid != InvalidTransactionId);
>
> SUGGESTION
> Assert(TransactionIdIsValid(xid));
>

fixed.

> ~~~
>
> 28. ReorderBufferSerializeChange
>
> + data += sizeof(int);
> + memcpy(data, change->data.ddlmsg.prefix,
> + prefix_size);
> + data += prefix_size;
>
> Unnecessary wrapping of memcpy.
>

fixed.

> ~
>
> 29.
>
> + memcpy(data, &change->data.ddlmsg.cmdtype, sizeof(int));
> + data += sizeof(int);
>
> Would that be better to write as:
>
> sizeof(DeparsedCommandType) instead of sizeof(int)
>

fixed.

> ~~~
>
> 30. ReorderBufferChangeSize
>
> + case REORDER_BUFFER_CHANGE_DDLMESSAGE:
> + {
> + Size prefix_size = strlen(change->data.ddlmsg.prefix) + 1;
> +
> + sz += prefix_size + change->data.ddlmsg.message_size +
> + sizeof(Size) + sizeof(Size) + sizeof(Oid) + sizeof(int);
>
> sizeof(DeparsedCommandType) instead of sizeof(int)
>

fixed.

Breaking this into two mails, next set of comments in next mail.

regards,
Ajin Cherian
Fujitsu Australia

From:	Ajin Cherian <itsajin(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-12-12 13:13:55
Message-ID:	CAFPTHDbaFByXyzEts+wZR5JLcCit50_S_j7qMF4tNeSvSuxZDA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Tue, Nov 15, 2022 at 10:57 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> ======
>
> src/backend/replication/logical/worker.c
>
> 32. preprocess_create_table
>
> +/* Remove the data population from the command */
> +static void
> +preprocess_create_table(RawStmt *command)
>
> The comment is too short. Needs more explanation than this.
>

fixed.

> ~~~
>
> 33. handle_create_table
>
> +/*
> + * Handle CREATE TABLE command
> + *
> + * Call AddSubscriptionRelState for CREATE TABEL command to set the relstate to
> + * SUBREL_STATE_READY so DML changes on this new table can be
> replicated without
> + * having to manually run "alter subscription ... refresh publication"
> + */
>
> Typo "TABEL"
>

fixed.

> ~~~
>
> 34. handle_create_table
>
> + switch (commandTag)
> + {
> + case CMDTAG_CREATE_TABLE:
> + {
> + CreateStmt *cstmt = (CreateStmt *) command->stmt;
> +
> + rv = cstmt->relation;
> + }
> + break;
> + default:
> + break;
> + }
> +
> + if (!rv)
> + return;
>
> This switch seems overcomplicated since the function only cares about
> CMDTAG_CREATE_TABLE.
>
> SUGGESTION
>
> if (commandTag == CMDTAG_CREATE_TABLE)
> {
> CreateStmt *cstmt = (CreateStmt *) command->stmt;
> rv = cstmt->relation;
> }
> else
> {
> return;
> }
>

fixed as suggested.

> ~
>
> 35.
>
> + if (relnamespace != InvalidOid)
> + relid = get_relname_relid(relname, relnamespace);
> + else
> + relid = RelnameGetRelid(relname);
> +
> + if (relid != InvalidOid)
> + {
>
> 35a.
> Maybe better to use the OidIsValid() macro for these places
>

fixed.

> ~
>
> 35b.
> I'm not 100% sure of this logic. Is it even *possible* for these to be
> InvalidOid -- e.g. I thought the CREATE TABLE would have failed
> already if this was the case. Maybe these checks can be changed to
> Asserts?
>

Theoretically somebody could have deleted the table in the meantime.

> ~~~
>
> 36. apply_handle_ddl
>
> +
> +static void
> +apply_handle_ddl(StringInfo s)
>
> Missing function comment
>

added comment.

> ======
>
> src/backend/replication/pgoutput/pgoutput.c
>
> 37. pgoutput_change
>
> @@ -1377,9 +1386,22 @@ pgoutput_change(LogicalDecodingContext *ctx,
> ReorderBufferTXN *txn,
> ReorderBufferChangeType action = change->action;
> TupleTableSlot *old_slot = NULL;
> TupleTableSlot *new_slot = NULL;
> + bool table_rewrite = false;
>
> update_replication_progress(ctx, false);
>
> + /*
> + * For heap rewrites, we might need to replicate them if the rewritten
> + * table publishes rewrite ddl message. So get the actual relation here
> + * and check the pubaction later.
> + */
> + if (relation->rd_rel->relrewrite)
> + {
> + table_rewrite = true;
> + relation = RelationIdGetRelation(relation->rd_rel->relrewrite);
> + targetrel = relation;
> + }
> +
> if (!is_publishable_relation(relation))
> return;
>
> @@ -1413,6 +1435,13 @@ pgoutput_change(LogicalDecodingContext *ctx,
> ReorderBufferTXN *txn,
> Assert(false);
> }
>
> + /*
> + * We don't publish table rewrite change unless we publish the rewrite ddl
> + * message.
> + */
> + if (table_rewrite && !relentry->pubactions.pubddl)
> + return;
> +
>
> Something does not seem right. Other code later in this function takes
> care to call RelationClose(relation), but in the above change, the
> logic is just returning without closing anything.
>

There is code just above this where you return if the publication action
does not match the action.

> ~~~
>
> 38. pgoutput_message
>
> @@ -1671,8 +1714,8 @@ pgoutput_truncate(LogicalDecodingContext *ctx,
> ReorderBufferTXN *txn,
>
> static void
> pgoutput_message(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
> - XLogRecPtr message_lsn, bool transactional, const char *prefix, Size sz,
> - const char *message)
> + XLogRecPtr message_lsn, bool transactional,
> + const char *prefix, Size sz, const char *message)
> {
>
> This change of wrapping seems unrelated , so should not be done in this patch.
>

removed.

> ~~~
>
> 39. pgoutput_ddlmessage
>
> +static void
> +pgoutput_ddlmessage(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
> + XLogRecPtr message_lsn,
> + const char *prefix, Oid relid, DeparsedCommandType cmdtype,
> + Size sz, const char *message)
>
> Missing function comment.
>

Added comment.

> ~
>
> 40.
>
> + switch (cmdtype)
>
> 40a.
> Might be tidier to have a consistent space *before* each case of this switch.
>

fixed.

> ~
>
> 40b.
> I felt it was too confusing having some of the switch case break and
> some of the switch cases return from the function -- e.g It seems
> difficult to know what conditions will execute the code that follows
> the switch. Maybe all this needs to be refactored somehow, or just
> commented on more.
>

added more comments.

> ======
>
> src/bin/pg_dump/pg_dump.c
>
> 41. getPublications
>
> - if (fout->remoteVersion >= 130000)
> + if (fout->remoteVersion >= 150000)
>
> Should be >= 160000, right?
>

fixed.

> ~
>
> 42.
>
> else if (fout->remoteVersion >= 110000)
> appendPQExpBufferStr(query,
> "SELECT p.tableoid, p.oid, p.pubname, "
> "p.pubowner, "
> - "p.puballtables, p.pubinsert, p.pubupdate, p.pubdelete,
> p.pubtruncate, false AS pubviaroot "
> + "p.puballtables, p.pubinsert, p.pubupdate, p.pubdelete,
> p.pubtruncate, false as p.pubddl, false AS pubviaroot "
> "FROM pg_publication p");
> else
> appendPQExpBufferStr(query,
> "SELECT p.tableoid, p.oid, p.pubname, "
> "p.pubowner, "
> - "p.puballtables, p.pubinsert, p.pubupdate, p.pubdelete, false AS
> pubtruncate, false AS pubviaroot "
> + "p.puballtables, p.pubinsert, p.pubupdate, p.pubdelete, false AS
> pubtruncate, false as p.pubddl, false AS pubviaroot "
> "FROM pg_publication p");
>
> Use uppercase 'AS' for consistency with other code.
>

fixed.

> ======
>
> src/bin/pg_dump/pg_dump.h
>
> 43. PublicationInfo
>
> @@ -620,6 +620,7 @@ typedef struct _PublicationInfo
> bool pubdelete;
> bool pubtruncate;
> bool pubviaroot;
> + bool pubddl;
> } PublicationInfo;
>
> IMO the new member should be adjacent to the other 'publish' parameter
> values like pubdelete/pubtruncate.
>

I have moved this member up.

> ======
>
> src/bin/psql/describe.c
>
> 44. listPublications
>
> + if (pset.sversion >= 140000)
> + appendPQExpBuffer(&buf,
> + ",\n pubddl AS \"%s\"",
> + gettext_noop("DDLs"));
>
> 44a.
> Should that be 160000?
>

updated.

> ~
>
> 44b.
> IMO it would be better if "DLLs" column appeared adjacent to that the
> other 'publish' parameter option values. (e.g. these are not even the
> same column ordering as pg_dump).
>
> ~~~
>
> 45. describePublications
>
> has_pubtruncate = (pset.sversion >= 110000);
> has_pubviaroot = (pset.sversion >= 130000);
> + has_pubddl = (pset.sversion >= 150000);
>
> Shouldn't that be 160000?
>
> ~
>
> 46.
>
> @@ -6313,6 +6319,9 @@ describePublications(const char *pattern)
> if (has_pubviaroot)
> appendPQExpBufferStr(&buf,
> ", pubviaroot");
> + if (has_pubddl)
> + appendPQExpBufferStr(&buf,
> + ", pubddl");
>
> IMO it would be better if "DLLs" column appeared adjacent to that the
> other 'publish' parameter option values. (e.g. these are not even the
> same column ordering as pg_dump).
>

Will fix this in a future patch.

>
> ======
>
> src/include/catalog/pg_proc.dat
>
> 47.
>
> +{ oid => '4644', descr => 'trigger for ddl command deparse',
> + proname => 'publication_deparse_ddl_command_end', prorettype =>
> 'event_trigger',
> + proargtypes => '', prosrc => 'publication_deparse_ddl_command_end' },
>
> Why doesn't the description say 'end'?
>

fixed this.

> ======
>
> src/include/catalog/pg_publication.h
>
> 48. FormData_pg_publication
>
> +
> + /* true if table creations are published */
> + bool pubddl;
> } FormData_pg_publication;
>
> Why just table publications? I thought it was for EVERYTHING.
>

fixed.

> ~~~
>
> 49. PublicationActions
>
> + bool pubddl;
> } PublicationActions;
>
> This might be OK for POC, but for the real feature, I think this
> should be more fine-grained than this all-or-nothing DDL.
>

yes, we will need to rethink this.

> ======
>
> src/include/replication/ddlmessage.h
>
> 50.
>
> +{
> + Oid dbId; /* database Oid emitted from */
> + Size prefix_size; /* length of prefix */
> + Oid relid; /* id of the table */
> + DeparsedCommandType cmdtype; /* type of sql command */
> + Size message_size; /* size of the message */
> +
> + /*
> + * payload, including null-terminated prefix of length prefix_size
> + */
> + char message[FLEXIBLE_ARRAY_MEMBER];
> +} xl_logical_ddl_message;
>
>
> 50a.
> The prefix_size comment needs to say /* length of the prefix
> (including '\0' terminator) */
>

fixed.

> ~
>
> 50b.
> 'relid' seems specific to TABLE DDL. Will future versions have many
> more Oid members here? Or should this be a union member or a generic
> name like 'objid'?
>

it is specific to tables, this is only to check if the table is part
of publication.
All other objects are taken as a catch-all.

> ~~~
>
> 51. XLOG_LOGICAL_DDL_MESSAGE
>
> +/* RMGR API*/
> +#define XLOG_LOGICAL_DDL_MESSAGE 0x00
>
> 0x00 is same value as XLOG_LOGICAL_MESSAGE in message.h. That doesn't
> seem correct because then how will those different messages be
> identified?
>

Currently logical messages are not handled by subscriptions, so the
same value is overloaded.

> ======
>
> src/include/replication/logicalproto.h
>
> 52. LogicalRepMsgType
>
> @@ -61,6 +61,7 @@ typedef enum LogicalRepMsgType
> LOGICAL_REP_MSG_RELATION = 'R',
> LOGICAL_REP_MSG_TYPE = 'Y',
> LOGICAL_REP_MSG_MESSAGE = 'M',
> + LOGICAL_REP_MSG_DDLMESSAGE = 'L',
> LOGICAL_REP_MSG_BEGIN_PREPARE = 'b',
>
> The name already includes _MSG_ so why say MESSAGE again? IMO this
> should be called just LOGICAL_REP_MSG_DDL. See general comment.
>

fixed.

> ~~~
>
> 53.
>
> extern void logicalrep_write_message(StringInfo out, TransactionId
> xid, XLogRecPtr lsn,
> - bool transactional, const char *prefix, Size sz, const char *message);
> + bool transactional, const char *prefix,
> + Size sz, const char *message);
>
> Modifying the wrapping of this unrelated function should not be done
> in this patch.
>

fixed.

> ======
>
> src/include/replication/reorderbuffer.h
>
> 54. REORDER_BUFFER_CHANGE_DDLMESSAGE
>
> @@ -56,6 +58,7 @@ typedef enum ReorderBufferChangeType
> REORDER_BUFFER_CHANGE_INSERT,
> REORDER_BUFFER_CHANGE_UPDATE,
> REORDER_BUFFER_CHANGE_DELETE,
> + REORDER_BUFFER_CHANGE_DDLMESSAGE,
>
> Why not call it REORDER_BUFFER_CHANGE_DDL? -- see general review comment
>

fixed.

> ~~~
>
> 55. ReorderBufferChange
>
> + /* DDL Message. */
> + struct
> + {
> + char *prefix;
> + Size message_size;
> + char *message;
> + Oid relid;
> + DeparsedCommandType cmdtype;
> + } ddlmsg;
> +
>
> Why not call it ddl? -- see general review comment
>

fixed.

Will fix this in a future patch.

> ======
>
> src/test/regress/expected/publication.out
>
> 57.
>
> (Ditto comment for psql.out)
>
> I wondered if "DDLs" belongs adjacent to the
> Inserts/Updates/Deletes/Trucates because those are the other "publish"
> parameters just like this.
>
> ~~~
>
> 58.
>
> Looks like there is a missing regress test case where you actually set
> the publish='ddl' and then verify that the DDLs column is correctly
> set 't'?
>

lots of tests missing, will update in a future patch.

> ======
>
> 59. MISC = typedefs.list
>
> There are missing some typedefs.list changes for this patch. At least
> the following:
>
> e.g.
> - DeparsedCommandType (from ddlmessage.h)
> - xl_logical_ddl_message (from ddlmessage.h)
> - LogicalDecodeDDLMessageCB (from output_plugin.h)
> - LogicalDecodeStreamDDLMessageCB (from output_plugin.h)
> - ReorderBufferDDLMessageCB (from reorderbuffer.h)
> - ReorderBufferStreamDDLMessageCB (from reorderbuffer.h)
>

added.

regards,
Ajin Cherian
Fujitsu Australia

Attachment	Content-Type	Size
v46-0004-Test-cases-for-DDL-replication.patch	application/octet-stream	24.6 KB
v46-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	317.5 KB
v46-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	application/octet-stream	15.6 KB
v46-0002-Support-DDL-replication.patch	application/octet-stream	132.2 KB
v46-0005-Skip-ALTER-TABLE-subcommands-generated-for.patch	application/octet-stream	2.2 KB
v46-0006-Support-DDL-replication-of-alter-type-having-USI.patch	application/octet-stream	8.9 KB
v46-0007-Introduce-the-test_ddl_deparse_regress-test-module.-.patch	application/octet-stream	41.0 KB

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	Ajin Cherian <itsajin(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-12-12 17:40:19
Message-ID:	CAAD30UL_ZqvikjE0SpJ=1e1RNYeQSYq5ehHnZ0uez6x7d-7nfg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Hi,

Patched v46-0001 no longer compiles after the recent commit
840ff5f451cd9a391d237fc60894fea7ad82a189, I resolved the conflicts in
v47 patch.

However, the following test case in the test_ddl_ddl-deparse_regress
module is broken again, I'll look into it later:

-- Test ALTER TABLE on inherited table
CREATE TABLE gtest30 (
a int,
b int GENERATED ALWAYS AS (a * 2) STORED
);
CREATE TABLE gtest30_1 () INHERITS (gtest30);
ALTER TABLE gtest30 ALTER COLUMN b DROP EXPRESSION;

Regards,
Zheng

Attachment	Content-Type	Size
v47-0005-Skip-ALTER-TABLE-subcommands-generated-for-TableLike.patch	application/octet-stream	2.2 KB
v47-0002-Support-DDL-replication.patch	application/octet-stream	132.5 KB
v47-0004-Test-cases-for-DDL-replication.patch	application/octet-stream	24.6 KB
v47-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	application/octet-stream	15.7 KB
v47-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	317.5 KB
v47-0006-Support-DDL-replication-of-alter-type-having-USING-e.patch	application/octet-stream	9.0 KB
v47-0007-Introduce-the-test_ddl_deparse_regress-test-module.-.patch	application/octet-stream	46.8 KB

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	Ajin Cherian <itsajin(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-12-12 22:49:26
Message-ID:	CAAD30UJG0ViTVdWNKu2rJXssmB5beCb+wEpph9ca6C3GC=Qvjw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Mon, Dec 12, 2022 at 12:40 PM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
>
> Hi,
>
> Patched v46-0001 no longer compiles after the recent commit
> 840ff5f451cd9a391d237fc60894fea7ad82a189, I resolved the conflicts in
> v47 patch.
>
> However, the following test case in the test_ddl_ddl-deparse_regress
> module is broken again, I'll look into it later:
>
> -- Test ALTER TABLE on inherited table
> CREATE TABLE gtest30 (
> a int,
> b int GENERATED ALWAYS AS (a * 2) STORED
> );
> CREATE TABLE gtest30_1 () INHERITS (gtest30);
> ALTER TABLE gtest30 ALTER COLUMN b DROP EXPRESSION;

Fixed the above issue in the v48 patch set.

Regards,
Zheng

Attachment	Content-Type	Size
v48-0004-Test-cases-for-DDL-replication.patch	application/octet-stream	24.6 KB
v48-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	application/octet-stream	15.7 KB
v48-0005-Skip-ALTER-TABLE-subcommands-generated-for-TableLike.patch	application/octet-stream	2.2 KB
v48-0002-Support-DDL-replication.patch	application/octet-stream	132.5 KB
v48-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	317.5 KB
v48-0007-Introduce-the-test_ddl_deparse_regress-test-module.-.patch	application/octet-stream	46.8 KB
v48-0006-Support-DDL-replication-of-alter-type-having-USING-e.patch	application/octet-stream	9.0 KB

From:	li jie <ggysxcq(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-12-13 02:37:15
Message-ID:	CAGfChW4vxVCgWs2=b+SDag0j3G-3Aqw5XvKnHVAReB-iysyj+g@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

I noticed that the issue of ownership seems to have not been considered.
For example, if a user 'a' from the publishing side creates a table t1,
the owner of t1 is not user 'a' after it is replicated to the subscribing side.
This is a situation that has not been encountered in previous DML replication.

I think the ownership relationship should not be lost,
and we should perhaps add it back,
like pg_dump "ALTER TABLE public.t1 OWNER TO a;",
even though we do not currently support the replication of USER.

Thought? li jie.

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-12-13 09:17:32
Message-ID:	CALDaNm2CazsD+Vibr8Khd_JXAvb1-FDtqhAaMS34HmdH0D9ATw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Thu, 3 Nov 2022 at 13:11, Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Here are some more review comments for the v32-0001 file ddl_deparse.c
>
> (This is a WIP since it is such a large file)
>
> ======
>
> 1. General - calling VA with 0 args
>
> There are some calls to new_objtree_VA() where 0 extra args are passed.
>
> e.g. See in deparse_AlterFunction
> * alterFunc = new_objtree_VA("ALTER FUNCTION", 0);
> * ObjTree *tmpobj = new_objtree_VA("%{type}T", 0);
> * tmpobj = new_objtree_VA(intVal(defel->arg) ? "RETURNS NULL ON NULL
> INPUT" : "CALLED ON NULL INPUT", 0);
> * tmpobj = new_objtree_VA(intVal(defel->arg) ? "SECURITY DEFINER" :
> "SECURITY INVOKER", 0);
> * tmpobj = new_objtree_VA(intVal(defel->arg) ? "LEAKPROOF" : "NOT
> LEAKPROOF", 0);
> * etc.
>
> Shouldn't all those just be calling the new_objtree() function instead
> of new_objtree_VA()?
>
> Probably there are more than just those cited - please search for others.

Modified wherever possible

> 2. General - when to call append_xxx functions?
>
> I did not always understand what seems like an arbitrary choice of
> function calls to append_xxx.
>
> e.g. Function deparse_AlterObjectSchemaStmt() does:
>
> + append_string_object(alterStmt, "%{identity}s", ident);
> +
> + append_string_object(alterStmt, "SET SCHEMA %{newschema}I", newschema);
>
> Why doesn't the above just use new_objtree_VA instead -- it seemed to
> me like the _VA function is underused in some places. Maybe search all
> the append_xxx usage - I suspect many of those can in fact be combined
> to use new_objtree_VA().

Modified wherever possible

> 3. General - extract object names from formats
>
> IIUC the calls to append_XXX__object will call deeper to
> append_object_to_format_string(), which has a main side-effect loop to
> extract the "name" part out of the sub_fmt string. But this logic all
> seems inefficient and unnecessary to me. I think in most (all?) cases
> the caller already knows what the object name should be, so instead of
> making code work to figure it out again, it can just be passed in the
> same way the _VA() function passes the known object name.
>
> There are many cases of this:
>
> e.g.
>
> BEFORE
> append_string_object(alterop, "(%{left_type}s", "NONE");
>
> AFTER - just change the signature to pass the known object name
> append_string_object(alterop, "(%{left_type}s", "left_type", "NONE");

Modified

> 4. General - fwd declares
>
> static void append_array_object(ObjTree *tree, char *name, List *array);
> static void append_bool_object(ObjTree *tree, char *name, bool value);
> static void append_float_object(ObjTree *tree, char *name, float8 value);
> static void append_null_object(ObjTree *tree, char *name);
> static void append_object_object(ObjTree *tree, char *name, ObjTree *value);
> static char *append_object_to_format_string(ObjTree *tree, char *sub_fmt);
> static void append_premade_object(ObjTree *tree, ObjElem *elem);
> static void append_string_object(ObjTree *tree, char *name, char *value);
>
>
> I think those signatures are misleading. AFAIK seems what is called
> the 'name' param above is often a 'sub_fmt' param in the
> implementation.

Modified

> 5. General - inconsistent append_array object calls.
>
> Sometimes enclosing brackets are included as part of the format string
> to be appended and other times they are appended separately. IIUC
> there is no difference but IMO the code should always be consistent to
> avoid it being confusing.
>
> e.g.1 (brackets in fmt)
> append_array_object(tmpobj, "(%{rettypes:, }s)", rettypes);
>
> e.g.2 (brackets appended separately)
> + append_format_string(tmpobj, "(");
> + append_array_object(tmpobj, "%{argtypes:, }T", arglist);
> + append_format_string(tmpobj, ")");

We cannot change it to a single call in all cases because in some
cases there is a possibility that the list can be NULL, if the list is
empty then append_array_object will return without appending "(". For
an empty list, we should append it with (). I'm not making any changes
for this.

> 6. General - missing space before '('
>
> I noticed a number of comment where there is a space missing before a '('.
> Here are some examples:
>
> - * An element of an object tree(ObjTree).
> - * typmod at the middle of name(e.g. TIME(6) with time zone ). We cannot
> - * Sequence for IDENTITY COLUMN output separately(via CREATE TABLE or
> - * Sequence for IDENTITY COLUMN output separately(via CREATE TABLE or
>
> Search all the patch-code to find others and add missing spaces.

Modified

> 7. General - Verbose syntax comments
>
> Some (but not all) of the deparse_XXX functions have a comment
> describing the verbose syntax.
>
> e.g.
> /*
> * Verbose syntax
> *
> * CREATE %{default}s CONVERSION %{identity}D FOR %{source}L TO %{dest}L
> * FROM %{function}D
> */
>
> These are helpful for understanding the logic of the function, so IMO
> similar comments should be written for *all* of the deparse_xxx
> function.
>
> And maybe a more appropriate place to put these comments is in the
> function header comment.

Modified

> 8. new_objtree_VA
>
> + /*
> + * For all other param types there must be a value in the varargs.
> + * Fetch it and add the fully formed subobject into the main object.
> + */
> + switch (type)
>
> What does the comment mean when it says - for all "other" param types?

I felt the comment meant "For all param types other than ObjTypeNull",
changed it accordingly.

> 9. objtree_to_jsonb_element
>
> + ListCell *cell;
> + JsonbValue val;
>
> The 'cell' is only for the ObjTypeArray so consider declaring it for
> that case only.

Modified

> 10. obtainConstraints
>
> + else
> + {
> + Assert(OidIsValid(domainId));
>
> Looks like the above Assert is unnecessary because the earlier Assert
> (below) already ensures this:
> + /* Only one may be valid */
> + Assert(OidIsValid(relationId) ^ OidIsValid(domainId));

Removed it

> 11. pg_get_indexdef_detailed
>
> + /* Output tablespace */
> + {
> + Oid tblspc;
> +
> + tblspc = get_rel_tablespace(indexrelid);
> + if (OidIsValid(tblspc))
> + *tablespace = pstrdup(quote_identifier(get_tablespace_name(tblspc)));
> + else
> + *tablespace = NULL;
> + }
> +
> + /* Report index predicate, if any */
> + if (!heap_attisnull(ht_idx, Anum_pg_index_indpred, NULL))
> + {
> + Node *node;
> + Datum predDatum;
> + char *predString;
> +
> + /* Convert text string to node tree */
> + predDatum = SysCacheGetAttr(INDEXRELID, ht_idx,
> + Anum_pg_index_indpred, &isnull);
> + Assert(!isnull);
> + predString = TextDatumGetCString(predDatum);
> + node = (Node *) stringToNode(predString);
> + pfree(predString);
> +
> + /* Deparse */
> + *whereClause =
> + deparse_expression(node, context, false, false);
> + }
> + else
> + *whereClause = NULL;
>
> Maybe just assign defaults:
> *tablespace = NULL;
> *whereClause = NULL;
>
> then overwrite those defaults, so can avoid the 'else' code.

Modified

> 12. stringify_objtype
>
> +/*
> + * Return the given object type as a string.
> + */
> +static const char *
> +stringify_objtype(ObjectType objtype)
>
> 12a.
> This statics function feels like it belongs more in another module as
> a utility function.

Moved it

> 12b.
> Actually, this function looks like it might be more appropriate just
> as a static lookup array/map of names keys by the ObjectType, and
> using a StaticAssertDecl for sanity checking.

We will not be able to use static array map in this case as this
function returns multiple types like in case of OBJECT_COLUMN returns
"TABLE" or "COLUMN" based on isgrant is true or false. similarly in
case of OBJECT_FOREIGN_SERVER too. I have not made any changes for
this.

> 13. deparse_GrantStmt
>
> + /*
> + * If there are no objects from "ALL ... IN SCHEMA" to be granted, then we
> + * need not do anything.
> + */
> + if (istmt->objects == NIL)
> + return NULL;
>
> "we need not do anything." -> "nothing to do."

Modified

> 14. deparse_GrantStmt
>
> + switch (istmt->objtype)
> + {
> + case OBJECT_COLUMN:
> + case OBJECT_TABLE:
> + objtype = "TABLE";
> + classId = RelationRelationId;
> + break;
> + case OBJECT_SEQUENCE:
> + objtype = "SEQUENCE";
> + classId = RelationRelationId;
> + break;
> + case OBJECT_DOMAIN:
> + objtype = "DOMAIN";
> + classId = TypeRelationId;
> + break;
> + case OBJECT_FDW:
> + objtype = "FOREIGN DATA WRAPPER";
> + classId = ForeignDataWrapperRelationId;
> + break;
> + case OBJECT_FOREIGN_SERVER:
> + objtype = "FOREIGN SERVER";
> + classId = ForeignServerRelationId;
> + break;
> + case OBJECT_FUNCTION:
> + objtype = "FUNCTION";
> + classId = ProcedureRelationId;
> + break;
> + case OBJECT_PROCEDURE:
> + objtype = "PROCEDURE";
> + classId = ProcedureRelationId;
> + break;
> + case OBJECT_ROUTINE:
> + objtype = "ROUTINE";
> + classId = ProcedureRelationId;
> + break;
> + case OBJECT_LANGUAGE:
> + objtype = "LANGUAGE";
> + classId = LanguageRelationId;
> + break;
> + case OBJECT_LARGEOBJECT:
> + objtype = "LARGE OBJECT";
> + classId = LargeObjectRelationId;
> + break;
> + case OBJECT_SCHEMA:
> + objtype = "SCHEMA";
> + classId = NamespaceRelationId;
> + break;
> + case OBJECT_TYPE:
> + objtype = "TYPE";
> + classId = TypeRelationId;
> + break;
> + case OBJECT_DATABASE:
> + case OBJECT_TABLESPACE:
> + objtype = "";
> + classId = InvalidOid;
> + elog(ERROR, "global objects not supported");
> + break;
> + default:
> + elog(ERROR, "invalid OBJECT value %d", istmt->objtype);
> + }
>
>
> Shouldn't code be calling to the other function stringify_objtype() to
> do some of this?

Modified

> 15.
>
> + grantStmt = new_objtree_VA(fmt, 0);
> +
> + /* build a list of privileges to grant/revoke */
> + if (istmt->all_privs)
> + {
> + tmp = new_objtree_VA("ALL PRIVILEGES", 0);
>
> Here are some more examples of the _VA function being called with 0
> args. Why use _VA function?

This was already modified

> 16.
>
> + list = NIL;
> +
> + if (istmt->privileges & ACL_INSERT)
> + list = lappend(list, new_string_object("INSERT"));
> + if (istmt->privileges & ACL_SELECT)
> + list = lappend(list, new_string_object("SELECT"));
> + if (istmt->privileges & ACL_UPDATE)
> + list = lappend(list, new_string_object("UPDATE"));
> + if (istmt->privileges & ACL_DELETE)
> + list = lappend(list, new_string_object("DELETE"));
> + if (istmt->privileges & ACL_TRUNCATE)
> + list = lappend(list, new_string_object("TRUNCATE"));
> + if (istmt->privileges & ACL_REFERENCES)
> + list = lappend(list, new_string_object("REFERENCES"));
> + if (istmt->privileges & ACL_TRIGGER)
> + list = lappend(list, new_string_object("TRIGGER"));
> + if (istmt->privileges & ACL_EXECUTE)
> + list = lappend(list, new_string_object("EXECUTE"));
> + if (istmt->privileges & ACL_USAGE)
> + list = lappend(list, new_string_object("USAGE"));
> + if (istmt->privileges & ACL_CREATE)
> + list = lappend(list, new_string_object("CREATE"));
> + if (istmt->privileges & ACL_CREATE_TEMP)
> + list = lappend(list, new_string_object("TEMPORARY"));
> + if (istmt->privileges & ACL_CONNECT)
> + list = lappend(list, new_string_object("CONNECT"));
>
> 16a.
> Shouldn't this be trying to re-use code like privilege_to_string()
> mapping function already in aclchk.c to get all those ACL strings?

Modified

> 16b.
> Is it correct that ACL_SET and ACL_ALTER_SYSTEM are missing?

Yes this is intentional as event trigger is not supported for global objects

> 17.
>
> The coding style is inconsistent in this function...
>
> For the same things - sometimes use the ternary operator; sometimes use if/else.
>
> e.g.1
> + append_string_object(grantStmt, "%{grant_option}s",
> + istmt->grant_option ? "WITH GRANT OPTION" : "");
>
> e.g.2
> + if (istmt->behavior == DROP_CASCADE)
> + append_string_object(grantStmt, "%{cascade}s", "CASCADE");
> + else
> + append_string_object(grantStmt, "%{cascade}s", "");

Modified

> 18. deparse_AlterOpFamily
>
> + tmpobj2 = new_objtree_VA("FOR ORDER BY", 0);
> + append_object_object(tmpobj2, "%{opfamily}D",
> + new_objtree_for_qualname_id(OperatorFamilyRelationId,
> + oper->sortfamily));
>
> Why isn't something like this combined to be written as a signle
> new_objtree_VA call?

Modified

> 19. deparse_Type_Storage
>
> + tmpstr = psprintf("%s", str);
> +
> + fmt = "STORAGE = %{value}s";
> +
> + storage = new_objtree_VA(fmt, 2,
> + "clause", ObjTypeString, "storage",
> + "value", ObjTypeString, tmpstr);
>
> 19a.
> What is the purpose of tmpstr? Seems unnecessary

It is not required, removed it

> 19b.
> What is the purpose of separate 'fmt' var? Why not just pass format
> string as a parameter literal to the new_objtree_VA()

Modified

> 20. deparse_CreateConversion
>
> + /* Add the DEFAULT clause */
> + append_string_object(ccStmt, "default",
> + conForm->condefault ? "DEFAULT" : "");
>
> 20a.
> Is that code correct? I thought the fmt should look like
> "%{default}s", otherwise won't the resulting string object have no
> name?

I have changed it to use it in new_objtree_VA and changed it to %{default}s

> 20b.
> Anyway, it does not seem to match what the preceding verbose syntax
> comment says.

Modified

> 21.
>
> +
> +
> + /* Add the DEFAULT clause */
> + append_string_object(ccStmt, "default",
> + conForm->condefault ? "DEFAULT" : "");
> +
> + tmpObj = new_objtree_for_qualname(conForm->connamespace,
> NameStr(conForm->conname));
> + append_object_object(ccStmt, "CONVERSION %{identity}D", tmpObj);
> + append_string_object(ccStmt, "FOR %{source}L", (char *)
> + pg_encoding_to_char(conForm->conforencoding));
> + append_string_object(ccStmt, "TO %{dest}L", (char *)
> + pg_encoding_to_char(conForm->contoencoding));
> + append_object_object(ccStmt, "FROM %{function}D",
> + new_objtree_for_qualname_id(ProcedureRelationId,
> + conForm->conproc));
>
> I don't really understand why all this is not written instead using a
> single new_objtree_VA() call.

Modified

> 22. deparse_CreateEnumStmt
>
> + enumtype = new_objtree("CREATE TYPE");
> + append_object_object(enumtype, "%{identity}D",
> + new_objtree_for_qualname_id(TypeRelationId,
> + objectId));
> +
> + values = NIL;
> + foreach(cell, node->vals)
> + {
> + String *val = lfirst_node(String, cell);
> +
> + values = lappend(values, new_string_object(strVal(val)));
> + }
> +
> + append_array_object(enumtype, "AS ENUM (%{values:, }L)", values);
> + return enumtype;
>
> Ditto. I don't really understand why all this is not written instead
> using a single new_objtree_VA() call.

Modified

> 23. deparse_CreateExtensionStmt
>
> + extStmt = new_objtree("CREATE EXTENSION");
> +
> + append_string_object(extStmt, "%{if_not_exists}s",
> + node->if_not_exists ? "IF NOT EXISTS" : "");
> +
> + append_string_object(extStmt, "%{name}I", node->extname);
>
> Ditto. I don't really understand why all this is not written instead
> using a single new_objtree_VA() call.

Modified

> 24. deparse_FdwOptions
>
> + tmp = new_objtree_VA("OPTIONS", 0);
>
> Isn't it better to call other function instead of passing zero params
> to this one?

This was already fixed

> 25. deparse_CreateFdwStmt
>
> 25a.
> + /* add HANDLER clause */
> + if (fdwForm->fdwhandler == InvalidOid)
> + tmp = new_objtree_VA("NO HANDLER", 0);
> + else
>
> Isn't it better to call other function instead of passing zero params
> to this one?

This is fixed already

> 25b.
> + /* add VALIDATOR clause */
> + if (fdwForm->fdwvalidator == InvalidOid)
> + tmp = new_objtree_VA("NO VALIDATOR", 0);
>
> Ditto #25a

This is fixed already

>
> 25c.
> Both above should use OidIsValid macro.

Modified

> 26. deparse_AlterFdwStmt
>
> 26a.
> + /* add HANDLER clause */
> + if (fdwForm->fdwhandler == InvalidOid)
> + tmp = new_objtree_VA("NO HANDLER", 0);
>
> Ditto #25a

This is fixed already

> 26b.
> + /* add VALIDATOR clause */
> + if (fdwForm->fdwvalidator == InvalidOid)
> + tmp = new_objtree_VA("NO VALIDATOR", 0);
>
> Ditto #25a

This is fixed already

> 26c.
> Both above should use OidIsValid macro.

Modified

> 27. deparse_CreateRangeStmt
>
> + /* SUBTYPE */
> + tmp = new_objtree_for_qualname_id(TypeRelationId,
> + rangeForm->rngsubtype);
> + tmp = new_objtree_VA("SUBTYPE = %{type}D",
> + 2,
> + "clause", ObjTypeString, "subtype",
> + "type", ObjTypeObject, tmp);
> + definition = lappend(definition, new_object_object(tmp));
>
>
> The reusing of 'tmp' variable seems a bit sneaky to me. Perhaps using
> 'tmp' and 'tmp_qualid' might be a more readable way to go here.

Removed usage of tmp

> 28. deparse_AlterEnumStmt
>
> + if (node->newValNeighbor)
> + {
> + append_string_object(alterEnum, "%{after_or_before}s",
> + node->newValIsAfter ? "AFTER" : "BEFORE");
> + append_string_object(alterEnum, "%{neighbour}L", node->newValNeighbor);
> + }
>
> Has a mix of US and UK spelling of neighbor/neighbour?

Modified to neighbor

Thanks for the comments, the attached v49 patch has the changes for the same.

Regards,
Vignesh

Attachment	Content-Type	Size
v49-0004-Test-cases-for-DDL-replication.patch	text/x-patch	24.6 KB
v49-0001-Functions-to-deparse-DDL-commands.patch	text/x-patch	324.9 KB
v49-0005-Skip-ALTER-TABLE-subcommands-generated-for-Table.patch	text/x-patch	2.2 KB
v49-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	text/x-patch	15.7 KB
v49-0002-Support-DDL-replication.patch	text/x-patch	132.5 KB
v49-0006-Support-DDL-replication-of-alter-type-having-USI.patch	text/x-patch	9.0 KB
v49-0007-Introduce-the-test_ddl_deparse_regress-test-modu.patch	text/x-patch	46.7 KB

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	li jie <ggysxcq(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-12-13 22:25:48
Message-ID:	CAAD30ULvGr2McLEe=aguK3z+BvtJ8PY5GLVbAHDcorAaAUFmBQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Attaching V50 patch set which adds support for SECURITY LABEL commands.
Also fixed an issue in deparse_AlterRelation from the last conflict resolution.

On Mon, Dec 12, 2022 at 9:37 PM li jie <ggysxcq(at)gmail(dot)com> wrote:
>
> I noticed that the issue of ownership seems to have not been considered.
> For example, if a user 'a' from the publishing side creates a table t1,
> the owner of t1 is not user 'a' after it is replicated to the subscribing side.
> This is a situation that has not been encountered in previous DML replication.
>
> I think the ownership relationship should not be lost,
> and we should perhaps add it back,
> like pg_dump "ALTER TABLE public.t1 OWNER TO a;",
> even though we do not currently support the replication of USER.

I agree that this is a useful scenario. On the other hand, I think it
also makes sense to keep the ownership of objects to the subscription
owner in other use cases. Maybe we can make it optional whether to
inherit the original ownership of the publisher on the subscriber
using a new subscription_parameter. For example, original_ownership:
CREATE SUBSCRIPTION mysub CONNECTION 'conninfo' PUBLICATION mypub WITH
(original_ownership=true);

Then we can change the apply worker to set the ownership of new
objects accordingly. We also need to make the owner information
available in the logicalddlmsg, currently we don't log this
information.

Regards,
Zheng

Attachment	Content-Type	Size
v50-0004-Test-cases-for-DDL-replication.patch	application/octet-stream	24.6 KB
v50-0002-Support-DDL-replication.patch	application/octet-stream	132.5 KB
v50-0005-Skip-ALTER-TABLE-subcommands-generated-for-TableLike.patch	application/octet-stream	2.2 KB
v50-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	application/octet-stream	15.7 KB
v50-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	326.6 KB
v50-0006-Support-DDL-replication-of-alter-type-having-USING-e.patch	application/octet-stream	9.0 KB
v50-0007-Introduce-the-test_ddl_deparse_regress-test-module.-.patch	application/octet-stream	46.7 KB

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	li jie <ggysxcq(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-12-14 10:34:44
Message-ID:	CALDaNm0jtir-LffQtwxr6aD5XbiyWcXdsTDQbDOAFK=juP_zXg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Wed, 14 Dec 2022 at 03:56, Zheng Li <zhengli10(at)gmail(dot)com> wrote:
>
> Attaching V50 patch set which adds support for SECURITY LABEL commands.
> Also fixed an issue in deparse_AlterRelation from the last conflict resolution.
>
> On Mon, Dec 12, 2022 at 9:37 PM li jie <ggysxcq(at)gmail(dot)com> wrote:
> >
> > I noticed that the issue of ownership seems to have not been considered.
> > For example, if a user 'a' from the publishing side creates a table t1,
> > the owner of t1 is not user 'a' after it is replicated to the subscribing side.
> > This is a situation that has not been encountered in previous DML replication.
> >
> > I think the ownership relationship should not be lost,
> > and we should perhaps add it back,
> > like pg_dump "ALTER TABLE public.t1 OWNER TO a;",
> > even though we do not currently support the replication of USER.
>
> I agree that this is a useful scenario. On the other hand, I think it
> also makes sense to keep the ownership of objects to the subscription
> owner in other use cases. Maybe we can make it optional whether to
> inherit the original ownership of the publisher on the subscriber
> using a new subscription_parameter. For example, original_ownership:
> CREATE SUBSCRIPTION mysub CONNECTION 'conninfo' PUBLICATION mypub WITH
> (original_ownership=true);
>
> Then we can change the apply worker to set the ownership of new
> objects accordingly. We also need to make the owner information
> available in the logicalddlmsg, currently we don't log this
> information.

The patch was not compiling because of recent commit
60684dd834a222fefedd49b19d1f0a6189c1632e which removes ACL_VACUUM and
ACL_ANALYZE, updated the patch to handle the changes in ddl
replication of grant command accordingly. The attached patch has the
changes for the same.

Regards,
Vignesh

Attachment	Content-Type	Size
v50-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	text/x-patch	15.7 KB
v50-0004-Test-cases-for-DDL-replication.patch	text/x-patch	24.6 KB
v50-0002-Support-DDL-replication.patch	text/x-patch	132.5 KB
v50-0001-Functions-to-deparse-DDL-commands.patch	text/x-patch	326.5 KB
v50-0005-Skip-ALTER-TABLE-subcommands-generated-for-Table.patch	text/x-patch	2.2 KB
v50-0006-Support-DDL-replication-of-alter-type-having-USI.patch	text/x-patch	9.0 KB
v50-0007-Introduce-the-test_ddl_deparse_regress-test-modu.patch	text/x-patch	46.7 KB

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	li jie <ggysxcq(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-12-15 05:36:41
Message-ID:	CAAD30ULvFo4NDQGEb7QmtV12AwvuacHBjDGkgDNwb-UTauvOXA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Hi,

Attaching V51 patch which added support for ALTER DOMAIN DROP
CONSTRAINT. Deparsing and replication of CONSTRAINT otherwise seems
complete. Also fixed an issue in the test_ddl_deparse_regress module
and added two test files alter_table and domain.

Regards,
Zheng

On Wed, Dec 14, 2022 at 5:34 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Wed, 14 Dec 2022 at 03:56, Zheng Li <zhengli10(at)gmail(dot)com> wrote:
> >
> > Attaching V50 patch set which adds support for SECURITY LABEL commands.
> > Also fixed an issue in deparse_AlterRelation from the last conflict resolution.
> >
> > On Mon, Dec 12, 2022 at 9:37 PM li jie <ggysxcq(at)gmail(dot)com> wrote:
> > >
> > > I noticed that the issue of ownership seems to have not been considered.
> > > For example, if a user 'a' from the publishing side creates a table t1,
> > > the owner of t1 is not user 'a' after it is replicated to the subscribing side.
> > > This is a situation that has not been encountered in previous DML replication.
> > >
> > > I think the ownership relationship should not be lost,
> > > and we should perhaps add it back,
> > > like pg_dump "ALTER TABLE public.t1 OWNER TO a;",
> > > even though we do not currently support the replication of USER.
> >
> > I agree that this is a useful scenario. On the other hand, I think it
> > also makes sense to keep the ownership of objects to the subscription
> > owner in other use cases. Maybe we can make it optional whether to
> > inherit the original ownership of the publisher on the subscriber
> > using a new subscription_parameter. For example, original_ownership:
> > CREATE SUBSCRIPTION mysub CONNECTION 'conninfo' PUBLICATION mypub WITH
> > (original_ownership=true);
> >
> > Then we can change the apply worker to set the ownership of new
> > objects accordingly. We also need to make the owner information
> > available in the logicalddlmsg, currently we don't log this
> > information.
>
> The patch was not compiling because of recent commit
> 60684dd834a222fefedd49b19d1f0a6189c1632e which removes ACL_VACUUM and
> ACL_ANALYZE, updated the patch to handle the changes in ddl
> replication of grant command accordingly. The attached patch has the
> changes for the same.
>
> Regards,
> Vignesh

Attachment	Content-Type	Size
v51-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	application/octet-stream	15.7 KB
v51-0004-Test-cases-for-DDL-replication.patch	application/octet-stream	24.6 KB
v51-0002-Support-DDL-replication.patch	application/octet-stream	132.5 KB
v51-0005-Skip-ALTER-TABLE-subcommands-generated-for-TableLike.patch	application/octet-stream	2.2 KB
v51-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	327.1 KB
v51-0006-Support-DDL-replication-of-alter-type-having-USING-e.patch	application/octet-stream	9.0 KB
v51-0007-Introduce-the-test_ddl_deparse_regress-test-module.-.patch	application/octet-stream	64.6 KB

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-12-15 16:24:44
Message-ID:	CALDaNm3q_2Qfo4XbFkBT8mPz_ALS2MyuR=isP7_8Pz27JnLHiw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, 11 Nov 2022 at 10:17, Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Here are more review comments for the v32-0001 file ddl_deparse.c
>
> This completes my first review pass over this overly large file.
>
> This review has taken a long time, so for any of my review comments
> (in this and previous posts) that get rejected, please reply citing
> the rejected reference numbers, because I hope to avoid spending
> multiple days (in a future review) trying to reconcile what was
> addressed vs what was not addressed. TIA.
>
> *** NOTE - my review post became too big, so I split it into smaller parts.
>
> THIS IS PART 1 OF 4.
>
> ======
>
> src/backend/commands/ddl_deparse.c
>
> G.1. GENERAL _VA args wrapping
>
> + tmp = new_objtree_VA("WITH GRANT OPTION",
> + 1, "present", ObjTypeBool,
> + stmt->action->grant_option);
>
> In general, I think all these _VA() style function calls are easier to
> read if you can arrange to put each of the argument names on a new
> line instead of just wrapping them randomly.
>
> So the above would be better as:
>
> tmp = new_objtree_VA("WITH GRANT OPTION", 1,
> "present", ObjTypeBool, stmt->action->grant_option);
>
> Please search/modify all cases of new_objtree_VA like this.

Modified

> G.2. GENERAL - "type" object
>
> There are many functions that insert a "type" object for some purpose:
>
> e.g.
> + tmpobj = new_objtree_VA("DETACH PARTITION %{partition_identity}D FINALIZE", 2,
> + "type", ObjTypeString, "detach partition finalize",
> + "partition_identity", ObjTypeObject,
> + new_objtree_for_qualname_id(RelationRelationId,
> + sub->address.objectId));
>
> e.g.
> + tmpobj = new_objtree_VA(fmtstr, 2,
> + "type", ObjTypeString, "add column",
> + "definition", ObjTypeObject, tree);
>
> I'm not sure yet what these "type" objects are used for, but I felt
> that these unsubstituted values should look slightly more like enum
> values, and slightly less like real SQL syntax.
>
> For example - maybe do like this (for the above):
>
> "detach partition finalize" -> "type_detach_partition_finalize"
> "add column" -> "type_add_column"
> etc.

I felt this is mainly useful for handling when the publisher is
running on a higher version and the subscriber is running on a lower
version, this feature may or may not be part of the first version. We
might be removing this code from the final patch. I have not made any
change for this. We will handle this at a later point based on
handling version is required or not as part of patch to be committed.

> G.3. GENERAL - JSON deparsed structures should be documented
>
> AFAICT there are mixtures of different JSON structure styles at play
> in this module. Sometimes there are trees with names and sometimes
> not, sometimes there are "present" objects and sometimes not.
> Sometimes entire trees seemed unnecessary to me. It feels quite
> arbitrary in places but it's quite hard to compare them because
> everything is spread across 9000+ lines.
>
> IMO all these deparse structures ought to be documented. Then I think
> it will become apparent that lots of them are inconsistent with the
> others. Even if such documentation is ultimately not needed by
> end-users, I think it would be a very valuable internal design
> accompaniment to this module, and it would help a lot for
> reviews/maintenance/bug prevention etc. Better late than never.

There is "type" and "present" which might confuse the user, this is
required for handling when the publisher is running on a higher
version and the subscriber is running on a lower version, this feature
may or may not be part of the first version. We might be removing this
code from the final patch. I have not documented this part, the others
I have documented. Let me know if you are looking for adding comments
for some others particularly.

> G.4 GENERAL - Underuse of _VA() function.
>
> (Probably I've mentioned this before in previous review comments, but
> I keep encountering this many times).
>
> The json is sort of built up part by part and objects are appended ...
> it was probably easier to think about each part during coding but OTOH
> I think this style is often unnecessary. IMO most times the function
> can be far simpler just by gathering together all the necessary values
> and then using a single big new_objtree_VA() call to deparse the
> complete format in one call. I think it could also shave 100s of lines
> of code from the module.

Modified

> G.5 GENERAL - Inconsistent function comment wording.
>
> The function comments are worded in different ways...
>
> "Given a XXX OID and the parse tree that created it, return an ObjTree
> representing the creation command."
>
> versus
>
> "Given a XXX OID and the parse tree that created it, return the JSON
> blob representing the creation command."
>
> Please use consistent wording throughout.

Modified

> G.6 GENERAL - present=false
>
> There are many calls that do like:
> append_bool_object(tmpobj, "present", false);
>
> I was thinking the code would be cleaner if there was a wrapper function like:
>
> static void
> append_not_present(ObjTree objTree)
> {
> append_bool_object(objTree, "present", false);
> }

Modified

> G.7 GENERAL - psprintf format strings
>
> There are quite a few places where the format string is
> pre-constructed using psprintf.
>
> e.g.
> + fmt = psprintf("ALTER %s %%{identity}s OWNER TO %%{newowner}I",
> + stringify_objtype(node->objectType));
> +
> + ownerStmt = new_objtree_VA(fmt, 2,
> + "identity", ObjTypeString,
> + getObjectIdentity(&address, false),
> + "newowner", ObjTypeString,
> + get_rolespec_name(node->newowner));
>
> It's not entirely clear to me why this kind of distinction is even
> made, or even what are the rules governing the choice. AFAICT this
> same result could be achieved by using another string substitution
> marker. So why not do it that way instead of mixing different styles?
>
> IMO many/most of the psprintf can be removed.
>
> e.g. I mean something like this (for the above example):
>
> fmt = "ALTER %{obj_type}s %{identity}s OWNER TO %{newowner}I";
>
> ownerStmt = new_objtree_VA(fmt, 3,
> "obj_type", ObjTypeString, stringify_objtype(node->objectType),
> "identity", ObjTypeString, getObjectIdentity(&address, false),
> "newowner", ObjTypeString, get_rolespec_name(node->newowner));

Modified wherever possible

> G.8 GENERAL - Inconsistent OID/oid in error messages.
>
> errmsg("role with OID %u does not exist", roleoid)));
> elog(ERROR, "cache lookup failed for collation with OID %u", objectId);
> elog(ERROR, "cache lookup failure for function with OID %u",
> elog(ERROR, "cache lookup failed for schema with OID %u",
> errmsg("role with OID %u does not exist", istmt->grantor_uid)));
> elog(ERROR, "cache lookup failed for operator with OID %u", objectId);
> elog(ERROR, "cache lookup failed for type with OID %u", objectId);
> elog(ERROR, "cache lookup failed for conversion with OID %u", objectId);
> elog(ERROR, "cache lookup failed for extension with OID %u",
> elog(ERROR, "cache lookup failed for extension with OID %u",
> elog(ERROR, "cache lookup failed for cast with OID %u", objectId);
> elog(ERROR, "cache lookup failed for domain with OID %u", objectId);
> elog(ERROR, "cache lookup failure for function with OID %u",
> elog(ERROR, "cache lookup failure for language with OID %u",
> elog(ERROR, "null prosrc in function with OID %u", objectId);
> elog(ERROR, "cache lookup failed for opclass with OID %u", opcoid);
> elog(ERROR, "cache lookup failed for operator family with OID %u",
> opcForm->opcfamily);
> elog(ERROR, "cache lookup failed for operator family with OID %u", objectId);
> elog(ERROR, "cache lookup failed for domain with OID %u",
> elog(ERROR, "cache lookup failed for collation with OID %u", objectId);
> elog(ERROR, "cache lookup failed for operator with OID %u", objectId);
> elog(ERROR, "cache lookup failed for type with OID %u", objectId);
> elog(ERROR, "cache lookup failed for text search parser with OID %u",
> elog(ERROR, "cache lookup failed for text search dictionary " "with
> OID %u", objectId);
> elog(ERROR, "cache lookup failed for text search template with OID %u",
> elog(ERROR, "cache lookup failed for text search dictionary " "with
> OID %u", objectId);
> elog(ERROR, "cache lookup failed for opclass with OID %u",
> elog(ERROR, "cache lookup failed for operator family with OID %u",
> elog(ERROR, "cache lookup failure for transform with OID %u",
> elog(ERROR, "cache lookup failure for language with OID %u",
> elog(ERROR, "cache lookup failure for function with OID %u",
> elog(ERROR, "cache lookup failure for function with OID %u",
> elog(ERROR, "cache lookup failed for rewrite rule for view with OID
> %u", viewoid)
>
> elog(ERROR, "cache lookup failed for range with type oid %u",
> elog(ERROR, "cache lookup failed for rewrite rule with oid %u",
>
> G.8a.
> Most are uppercase 'OID'. A few are lowercase 'oid'

Modified

> G.8b.
> There is a mixture of "cache lookup failed" and "cache lookup failure"
> -- should all be the same.

Modified

> G.8c.
> A few above (e.g. role) have a different message text. Shouldn't those
> also be "cache lookup failed"?

Modified

> G.9 GENERAL - ObjTree variables
>
> Often the ObjTree variable (for the deparse_XXX function return) is
> given the name of the statement it is creating.
>
> Although it is good to be descriptive, often there is no need for long
> variable names (e.g. 'createTransform' etc), because there is no
> ambiguity anyway and it just makes for extra code characters and
> unnecessary wrapping. IMO it would be better to just call everything
> some short but *consistent* name across every function -- like 'stmt'
> or 'json_ddl' or 'root' or 'ret' ... or whatever.

Modified

Thanks for the comments, the attached v52 patch has the changes for the same.

Regards,
Vignesh

Attachment	Content-Type	Size
v52-0005-Skip-ALTER-TABLE-subcommands-generated-for-Table.patch	text/x-patch	2.2 KB
v52-0004-Test-cases-for-DDL-replication.patch	text/x-patch	24.6 KB
v52-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	text/x-patch	15.7 KB
v52-0001-Functions-to-deparse-DDL-commands.patch	text/x-patch	323.0 KB
v52-0002-Support-DDL-replication.patch	text/x-patch	132.5 KB
v52-0006-Support-DDL-replication-of-alter-type-having-USI.patch	text/x-patch	9.0 KB
v52-0007-Introduce-the-test_ddl_deparse_regress-test-modu.patch	text/x-patch	64.9 KB

From:	li jie <ggysxcq(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-12-19 10:02:17
Message-ID:	CAGfChW5Qbb8WKznzVqp+cyTqkzLt_vEeSe14td14et6HjnH7qw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

I have presented some comments below:

1. AT_AddColumn

> + tmpobj = new_objtree_VA("ADD %{objtype}s %{definition}s", 3,
[ IF NOT EXISTS ] is missing here.

2. AT_DropColumn
> + tmpobj = new_objtree_VA("DROP %{objtype}s %{column}I", 3,
[ IF EXISTS ] is missing here.

3. AT_DropConstraint
> + tmpobj = new_objtree_VA("DROP CONSTRAINT %{constraint}I", 2,
[ IF EXISTS ] is missing here.

4. AT_DetachPartition
> + tmpobj = new_objtree_VA("DETACH PARTITION %{partition_identity}D", 2,
[ CONCURRENTLY | FINALIZE ] is missing here.

5. deparse_CreateSeqStmt
> + ret = new_objtree_VA("CREATE %{persistence}s SEQUENCE %{identity}D %{definition: }s", 3,
[ IF NOT EXISTS ] is missing here.

6. deparse_IndexStmt
> + ret = new_objtree_VA("CREATE %{unique}s INDEX %{concurrently}s %{if_not_exists}s %{name}I ON %{table}D USING %{index_am}s (%{definition}s)", 7,
[ INCLUDE ] and [ ONLY ] are missing here.

7. deparse_RuleStmt
> + foreach(cell, actions)
> + list = lappend(list, new_string_object(lfirst(cell)));

if (actions == NIL)
list = lappend(list, new_string_object("NOTHING"));
else
{
foreach(cell, actions)
list = lappend(list, new_string_object(lfirst(cell)));
}

8. AT_AddIndexConstraint
> + tmpobj = new_objtree_VA("ADD CONSTRAINT %{name}I %{constraint_type}s USING INDEX %{index_name}I %{deferrable}s %{init_deferred}s", 6,
> + "type", ObjTypeString, "add constraint using index",
> + "name", ObjTypeString, get_constraint_name(constrOid),
> + "constraint_type", ObjTypeString,
> + istmt->deferrable ? "DEFERRABLE" : "NOT DEFERRABLE",

"constraint_type", ObjTypeString,
istmt->primary ? "PRIMARY KEY" : "UNIQUE",

9. regress test

Zheng Li <zhengli10(at)gmail(dot)com> 于2022年12月12日周一 12:58写道：

>
> Hi,
>
> Attached please find the DDL deparser testing module in the v45-0007
> patch, this testing module is written by Runqi Tian in [1] with minor
> modification from myself. I think we can
> start adding more tests to the module now that we're getting close to
> finish implementing the DDL deparser.
>
> This testing module ddl_deparse_regress aims to achieve the following
> four testing goals for the DDL deparser:
> 1. Test that the generated JSON blob is expected using SQL tests.
> 2. Test that the re-formed DDL command is expected using SQL tests.
> 3. Test that the re-formed DDL command has the same effect as the
> original command
> by comparing the results of pg_dump, using the SQL tests in 1 and 2.
> 4. Test that any new DDL syntax is handled by the DDL deparser by
> capturing and deparsing
> DDL commands ran by pg_regress.
>
> 1 and 2 is tested with SQL tests, by comparing the deparsed JSON blob
> and the re-formed command.
> 3 is tested with TAP framework in t/001_compare_dumped_results.pl
> 4 is tested with TAP framework and pg_regress in 002_regress_tests.pl,
> the execution is currently commented out because it will fail due
> unimplemented commands in the DDL deparser.
>
> [1] https://www.postgresql.org/message-id/flat/CAH8n8_jMTunxxtP4L-3tc%3DGNamg%3Dmg1X%3DtgHr9CqqjjzFLwQng%40mail.gmail.com
>

The test patch is very useful.
I see that the sql case in test_ddl_deparse_regress is similar to the
one in test_ddl_deparse.
Why don't we merge the test cases in test_ddl_deparse_regress into
test_ddl_deparse,
as the sql case can be completely reused with the sql files in test_ddl_deparse?
I believe this will make the tests more comprehensive and reduce redundancy.

Regards,
li jie

From:	Ajin Cherian <itsajin(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-12-19 13:19:14
Message-ID:	CAFPTHDYqL+65ApSZWBmFRpc4aeYaW3UAeGhyYWOQJNx1eAQ+og@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Mon, Oct 31, 2022 at 7:07 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Here are some more comments for the patch v32-0001, file:
> src/backend/commands/ddl_deparse.c
>
> This is a WIP, it being such a large file...
>
> ======
>
> 1. General - comments
>
> For better consistency, I suggest using uppercase for all the
> single-line comments in the function bodies.
>
> There are multiple of them - I am not going to itemize them all in
> this post. Please just search/replace all of them
>
> e.g.
> /* add the "ON table" clause */
> /* add the USING clause, if any */
> /* add the USING clause, if any */
>

fixed.

> ~~~
>
> 2. General - object names
>
> There is a bit of inconsistency with the param object names where
> there are multi-words.
>
> Some have underscore (e.g. "is_public", "subtype_diff", "if_not_exists", etc)...
> Many others do not (e.g. "schemaname", "objname", "rolename", etc)...
>
> IMO it would be better to use a consistent naming convention - e,g,
> maybe use '_' *everywhere*
>
>

fixed.

> ~~~
>
> 3. ObjTree
>
> +typedef struct ObjTree
> +{
> + slist_head params; /* Object tree parameters */
> + int numParams; /* Number of parameters in the object tree */
> + StringInfo fmtinfo; /* Format string of the ObjTree */
> + bool present; /* Indicates if boolean value should be stored */
> +} ObjTree;
>
> It seems that this member is called "parameters" in the sense that
> each of these params are destined to be substition-params of for the
> format string part of this struct.
>
> OK. That seems sensible here, but this 'parameter' terminology infests
> this whole source file. IIUC really much of the code is dealing with
> just JSON objects -- they don't become parameters until those objects
> get added into the params list of this structure. Basically, I felt
> the word 'parameter' in comments and the variables called 'param' in
> functions seemed a bit overused...
>
> ~~~
>
> 4. ObjElem
>
> + slist_node node; /* Used in converting back to ObjElem
> + * structure */
> +} ObjElem;
>
> At face value (and without yet seeing the usage), that comment about
> 'node' does not mean much. e.g. this is already an 'ObjElem' struct...
> (??)
>
> ~~~
>
> 5. verbose
>
> +/*
> + * Reduce some unncessary string from the output json stuff when verbose
> + * and "present" member is false. This means these strings won't be merged into
> + * the last DDL command.
> + */
> +bool verbose = true;
>
> The comment needs some rewording to explain what this is about more
> clearly and without the typos
>
> "Reduce some unncessary string from the output json stuff" ???
>
> ~~~

fixed.

>
> 6. add_policy_clauses
>
> + else
> + {
> + append_bool_object(policyStmt, "present", false);
> + }
>
> Something seems strange. Probably I'm wrong but just by code
> inspection it looks like there is potential for there to be multiple
> param {present:false} JSON objects:
>
> {"present" :false},
> {"present" :false},
> {"present" :false},
>
> Shouldn't those all be array elements or something? IIUC apart from
> just DDL, the JSON idea was going to (in future) allow potential
> machine manipulation of the values prior to the replication, but
> having all these ambiguous-looking objects does not seem to lend
> itself to that idea readily. How to know what are each of those params
> representing?
>

This is pruned later and false objects removed.

> ~~~
>
> 7. append_array_object
>
>
> + }
> +
> + }
>
> Spurious blank line
>
> ~~
>

fixed.

> 8.
>
> + /* Extract the ObjElems whose present flag is true */
> + foreach(lc, array)
> + {
> + ObjElem *elem = (ObjElem *) lfirst(lc);
> +
> + Assert(elem->objtype == ObjTypeObject ||
> + elem->objtype == ObjTypeString);
> +
> + if (!elem->value.object->present &&
> + elem->objtype == ObjTypeObject)
> + array = foreach_delete_current(array, lc);
> + }
> +
> + }
>
> 8a.
> Is that comment correct? Or should it say more like "remove elements
> where present flag is false" ??
>
> 8b.
> It's not clear to me what is going to be the result of deleting the
> array elements that are determined not present. Will this affect the
> length of the array written to JSON? What if there is nothing left at
> all - the top of this function return if the array length is zero, but
> the bottom(after the loop) has not got similar logic.
>

fixed, added a check at the end of the loop.

> ~~~
>
> 9. append_bool_object
>
> + /*
> + * Check if the present is part of the format string and store the boolean
> + * value
> + */
> + if (strcmp(sub_fmt, "present") == 0)
>
> The comment seems not right. Looks like not testing "present" is PART
> of the format string - it is testing it IS the ENTIRE format string.
>

fixed.

> ~~~
>
> 10. append_object_to_format_string
>
> + initStringInfo(&object_name);
> + end_ptr = sub_fmt + strlen(sub_fmt);
> +
> + for (cp = sub_fmt; cp < end_ptr; cp++)
> + {
> + if (*cp == '{')
> + {
> + start_copy = true;
> + continue;
> + }
> +
> + if (!start_copy)
> + continue;
> +
> + if (*cp == ':' || *cp == '}')
> + break;
> +
> + appendStringInfoCharMacro(&object_name, *cp);
> + }
>
> Instead of this little loop why doesn't the code just look for the
> name delimiters?
>
> e.g.
> pstart = strch(sub_fmt, '{');
> pend = strbrk(pstart, ":}");
>
> then the 'name' is what lies in between...
>
> ~~~
>
> 11.
>
> format_type_detailed(Oid type_oid, int32 typemod,
> Oid *nspid, char **typname, char **typemodstr,
> bool *typarray)
>
>
> There seems a bit mixture of param prefixes of both 'typ' and 'type'.
> Is it correct? If these are changed, check also in the function
> comment.
>

fixed.

> ~~~
>
> 12.
>
> + /*
> + * Special-case crock for types with strange typmod rules where we put
> + * typmod at the middle of name(e.g. TIME(6) with time zone ). We cannot
> + * schema-qualify nor add quotes to the type name in these cases.
> + */
>
> Missing space before '(e.g.'. Extra space before ').'.
>

fixed.

> ~~~
>
> 13. FunctionGetDefaults
>
> /*
> * Return the defaults values of arguments to a function, as a list of
> * deparsed expressions.
> */
>
> "defaults values" -> "default values"
>

fixed.

regards,
Ajin Cherian
Fujitsu Australia

Attachment	Content-Type	Size
v53-0002-Support-DDL-replication.patch	application/octet-stream	132.2 KB
v53-0005-Skip-ALTER-TABLE-subcommands-generated-for-Table.patch	application/octet-stream	2.2 KB
v53-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	application/octet-stream	15.6 KB
v53-0004-Test-cases-for-DDL-replication.patch	application/octet-stream	24.6 KB
v53-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	323.3 KB
v53-0006-Support-DDL-replication-of-alter-type-having-USI.patch	application/octet-stream	9.0 KB
v53-0007-Introduce-the-test_ddl_deparse_regress-test-modu.patch	application/octet-stream	46.7 KB

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-12-19 15:29:49
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On 2022-Oct-31, Peter Smith wrote:

> 6. add_policy_clauses
>
> + else
> + {
> + append_bool_object(policyStmt, "present", false);
> + }
>
> Something seems strange. Probably I'm wrong but just by code
> inspection it looks like there is potential for there to be multiple
> param {present:false} JSON objects:
>
> {"present" :false},
> {"present" :false},
> {"present" :false},
>
> Shouldn't those all be array elements or something? IIUC apart from
> just DDL, the JSON idea was going to (in future) allow potential
> machine manipulation of the values prior to the replication, but
> having all these ambiguous-looking objects does not seem to lend
> itself to that idea readily. How to know what are each of those params
> representing?

Do you mean that a single JSON object has multiple member with
"present":"false"? That sounds like something we should never produce,
and post-processing to remove them does not sound good either. Is that
really what is happening, or do I misunderstand?

Obviously, if you have an object with several sub-objects, each of the
sub-objects can have its own "present:false" label. The idea is that
the clause that each subobject represents may not be in the command as
written by the user; but perhaps a post-processor of the JSON blob wants
to change things so that the clause does appear in the final output.
And this should be doable for each individual optional clause in each
command, which means that, yeah, there should be multiple
"present:false" pairs in a single JSON blob, in different paths.

(For example, if the user writes "CREATE SEQUENCE foobar", we would get
a tree that has {fmt: "CACHE %{value}", present: false, value: 32}, so
if you just convert that to text DDL without further ado you would get
the original command verbatim; but you can poke the "present" to true so
you would get "CREATE SEQUENCE foobar CACHE 32".)

Also, I think I came up with the idea of having "present:boolean" a bit
late in the development of this code, so it's quite possible that there
are commands that are inconsistent in their support of this pattern.
That makes it especially important to review the representation of each
command carefully.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Postgres is bloatware by design: it was built to house
PhD theses." (Joey Hellerstein, SIGMOD annual conference 2002)

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	li jie <ggysxcq(at)gmail(dot)com>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-12-19 21:25:59
Message-ID:	CAAD30UKwHmLo1YyjB1j_vCEth0+OS9hkLr9Nrrsw5uwEuOs=+g@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Attaching v54 patch set which adds support for:
- CREATE/ALTER/DROP FOREIGN TABLE
- IMPORT FOREIGN SCHEMA, this is captured and replicated as individual
CREATE FOREIGN TABLE command for each FOREIGN TABLE in the SCHEMA.

Note:
DROP FOREIGN TABLE ft1 also generates:
DROP type IF EXISTS ft1;
and
DROP type IF EXISTS ft1[];
These two dropped objects are also captured and replicated to the
subscriber along with the DROP FOREIGN TABLE command which aren't
necessary.

In addition, the patch fixed a bug in deparse_CreateSchemaStmt which
causes a quoted identifier to fail in replication, for example:
CREATE SCHEMA "S 2"; is replicated as CREATE SCHEMA S 2, which will
fail during apply.
Fix is to change %{name}s -> %{name}I in deparse_CreateSchemaStmt.

On Mon, Dec 19, 2022 at 5:02 AM li jie <ggysxcq(at)gmail(dot)com> wrote:
>
> I have presented some comments below:

Thanks for the feedback. I'll look into these.

> 1. AT_AddColumn
>
> > + tmpobj = new_objtree_VA("ADD %{objtype}s %{definition}s", 3,
> [ IF NOT EXISTS ] is missing here.
>
......
>
> 9. regress test
>
> The test patch is very useful.
> I see that the sql case in test_ddl_deparse_regress is similar to the
> one in test_ddl_deparse.
> Why don't we merge the test cases in test_ddl_deparse_regress into
> test_ddl_deparse,
> as the sql case can be completely reused with the sql files in test_ddl_deparse?
> I believe this will make the tests more comprehensive and reduce redundancy.

We have set up test_ddl_deparse_regress as a new module initially to
not interfere with what's being tested by test_ddl_deparse. We could
merge the two test modules if it turns out that we can expand on
test_ddl_deparse to achieve our testing goals and to add more test
cases without breaking what's currently being tested by
test_ddl_deparse.

Regards,
Zheng

Attachment	Content-Type	Size
v54-0005-Skip-ALTER-TABLE-subcommands-generated-for-TableLike.patch	application/octet-stream	2.2 KB
v54-0003-Support-CREATE-TABLE-AS-SELECT-INTO.patch	application/octet-stream	15.7 KB
v54-0002-Support-DDL-replication.patch	application/octet-stream	132.7 KB
v54-0004-Test-cases-for-DDL-replication.patch	application/octet-stream	24.6 KB
v54-0001-Functions-to-deparse-DDL-commands.patch	application/octet-stream	327.4 KB
v54-0007-Introduce-the-test_ddl_deparse_regress-test-module.patch	application/octet-stream	46.8 KB
v54-0006-Support-DDL-replication-of-alter-type-having-USING-e.patch	application/octet-stream	9.0 KB

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-12-21 00:02:51
Message-ID:	CAHut+Pvo2cKHBhRpg=tWTsLJrsjBjAQ2vuDQ_-1UVdyvarxQgw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Tue, Dec 20, 2022 at 2:29 AM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>
> On 2022-Oct-31, Peter Smith wrote:
>
> > 6. add_policy_clauses
> >
> > + else
> > + {
> > + append_bool_object(policyStmt, "present", false);
> > + }
> >
> > Something seems strange. Probably I'm wrong but just by code
> > inspection it looks like there is potential for there to be multiple
> > param {present:false} JSON objects:
> >
> > {"present" :false},
> > {"present" :false},
> > {"present" :false},
> >
> > Shouldn't those all be array elements or something? IIUC apart from
> > just DDL, the JSON idea was going to (in future) allow potential
> > machine manipulation of the values prior to the replication, but
> > having all these ambiguous-looking objects does not seem to lend
> > itself to that idea readily. How to know what are each of those params
> > representing?
>
> Do you mean that a single JSON object has multiple member with
> "present":"false"? That sounds like something we should never produce,
> and post-processing to remove them does not sound good either. Is that
> really what is happening, or do I misunderstand?
>

Yes, that is what I meant.

The add_policy_clauses code below is from latest patch v54-0001:

+static void
+add_policy_clauses(ObjTree *ret, Oid policyOid, List *roles, bool do_qual,
+ bool do_with_check)
+{
+ Relation polRel = table_open(PolicyRelationId, AccessShareLock);
+ HeapTuple polTup = get_catalog_object_by_oid(polRel,
Anum_pg_policy_oid, policyOid);
+ Form_pg_policy polForm;
+
+ if (!HeapTupleIsValid(polTup))
+ elog(ERROR, "cache lookup failed for policy with OID %u", policyOid);
+
+ polForm = (Form_pg_policy) GETSTRUCT(polTup);
+
+ /* Add the "ON table" clause */
+ append_object_object(ret, "ON %{table}D",
+ new_objtree_for_qualname_id(RelationRelationId,
+ polForm->polrelid));
+
+ /*
+ * Add the "TO role" clause, if any. In the CREATE case, it always
+ * contains at least PUBLIC, but in the ALTER case it might be empty.
+ */
+ if (roles)
+ {
+ List *list = NIL;
+ ListCell *cell;
+
+ foreach(cell, roles)
+ {
+ RoleSpec *spec = (RoleSpec *) lfirst(cell);
+
+ list = lappend(list,
+ new_object_object(new_objtree_for_rolespec(spec)));
+ }
+ append_array_object(ret, "TO %{role:, }R", list);
+ }
+ else
+ append_not_present(ret);
+
+ /* Add the USING clause, if any */
+ if (do_qual)
+ {
+ Datum deparsed;
+ Datum storedexpr;
+ bool isnull;
+
+ storedexpr = heap_getattr(polTup, Anum_pg_policy_polqual,
+ RelationGetDescr(polRel), &isnull);
+ if (isnull)
+ elog(ERROR, "null polqual expression in policy %u", policyOid);
+ deparsed = DirectFunctionCall2(pg_get_expr, storedexpr, polForm->polrelid);
+ append_string_object(ret, "USING (%{expression}s)", "expression",
+ TextDatumGetCString(deparsed));
+ }
+ else
+ append_not_present(ret);
+
+ /* Add the WITH CHECK clause, if any */
+ if (do_with_check)
+ {
+ Datum deparsed;
+ Datum storedexpr;
+ bool isnull;
+
+ storedexpr = heap_getattr(polTup, Anum_pg_policy_polwithcheck,
+ RelationGetDescr(polRel), &isnull);
+ if (isnull)
+ elog(ERROR, "null polwithcheck expression in policy %u", policyOid);
+ deparsed = DirectFunctionCall2(pg_get_expr, storedexpr, polForm->polrelid);
+ append_string_object(ret, "WITH CHECK (%{expression}s)",
+ "expression", TextDatumGetCString(deparsed));
+ }
+ else
+ append_not_present(ret);
+
+ relation_close(polRel, AccessShareLock);
+}

Actually, I have not yet tried running this so maybe I am mistaken,
but looking at the code above I thought if 'roles' is NULL and
'do_qual' is false and 'do_with_check' is false then the logic could
end up doing:

+ append_not_present(ret);
+ append_not_present(ret);
+ append_not_present(ret);

> Obviously, if you have an object with several sub-objects, each of the
> sub-objects can have its own "present:false" label. The idea is that
> the clause that each subobject represents may not be in the command as
> written by the user; but perhaps a post-processor of the JSON blob wants
> to change things so that the clause does appear in the final output.
> And this should be doable for each individual optional clause in each
> command, which means that, yeah, there should be multiple
> "present:false" pairs in a single JSON blob, in different paths.
>
> (For example, if the user writes "CREATE SEQUENCE foobar", we would get
> a tree that has {fmt: "CACHE %{value}", present: false, value: 32}, so
> if you just convert that to text DDL without further ado you would get
> the original command verbatim; but you can poke the "present" to true so
> you would get "CREATE SEQUENCE foobar CACHE 32".)
>
>
> Also, I think I came up with the idea of having "present:boolean" a bit
> late in the development of this code, so it's quite possible that there
> are commands that are inconsistent in their support of this pattern.
> That makes it especially important to review the representation of each
> command carefully.
>

------
Kind Regards,
Peter Smith.
Fujitsu Australia

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	li jie <ggysxcq(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-12-21 17:21:51
Message-ID:	CALDaNm2V6YL6H4P9ZT95Ua_RDJaeDTUf6V0UDfrz4_vxhM5pMg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Tue, 20 Dec 2022 at 02:56, Zheng Li <zhengli10(at)gmail(dot)com> wrote:
>
> Attaching v54 patch set which adds support for:
> - CREATE/ALTER/DROP FOREIGN TABLE
> - IMPORT FOREIGN SCHEMA, this is captured and replicated as individual
> CREATE FOREIGN TABLE command for each FOREIGN TABLE in the SCHEMA.
>
> Note:
> DROP FOREIGN TABLE ft1 also generates:
> DROP type IF EXISTS ft1;
> and
> DROP type IF EXISTS ft1[];
> These two dropped objects are also captured and replicated to the
> subscriber along with the DROP FOREIGN TABLE command which aren't
> necessary.
>
> In addition, the patch fixed a bug in deparse_CreateSchemaStmt which
> causes a quoted identifier to fail in replication, for example:
> CREATE SCHEMA "S 2"; is replicated as CREATE SCHEMA S 2, which will
> fail during apply.
> Fix is to change %{name}s -> %{name}I in deparse_CreateSchemaStmt.
>
> On Mon, Dec 19, 2022 at 5:02 AM li jie <ggysxcq(at)gmail(dot)com> wrote:
> >
> > I have presented some comments below:
>
> Thanks for the feedback. I'll look into these.
>
> > 1. AT_AddColumn
> >
> > > + tmpobj = new_objtree_VA("ADD %{objtype}s %{definition}s", 3,
> > [ IF NOT EXISTS ] is missing here.
> >
> ......
> >
> > 9. regress test
> >
> > The test patch is very useful.
> > I see that the sql case in test_ddl_deparse_regress is similar to the
> > one in test_ddl_deparse.
> > Why don't we merge the test cases in test_ddl_deparse_regress into
> > test_ddl_deparse,
> > as the sql case can be completely reused with the sql files in test_ddl_deparse?
> > I believe this will make the tests more comprehensive and reduce redundancy.
>
> We have set up test_ddl_deparse_regress as a new module initially to
> not interfere with what's being tested by test_ddl_deparse. We could
> merge the two test modules if it turns out that we can expand on
> test_ddl_deparse to achieve our testing goals and to add more test
> cases without breaking what's currently being tested by
> test_ddl_deparse.

I have handled most of the comments for [1] in the v55 version patch
attached. I will handle the pending comments in the upcoming version
and reply to it.
[1] - https://www.postgresql.org/message-id/20221207122041.hbfj4hen3ibhdzgn%40alvherre.pgsql

Regards,
Vignesh

Attachment	Content-Type	Size
v55-0001-Infrastructure-to-support-DDL-deparsing.patch	text/x-patch	32.5 KB
v55-0003-DDL-messaging-infrastructure-for-DDL-replication.patch	text/x-patch	10.6 KB
v55-0005-Introduce-the-test_ddl_deparse_regress-test-modu.patch	text/x-patch	46.8 KB
v55-0004-Support-DDL-replication.patch	text/x-patch	166.7 KB
v55-0002-Functions-to-deparse-DDL-commands.patch	text/x-patch	299.6 KB

From:	"Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>
To:	'vignesh C' <vignesh21(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	li jie <ggysxcq(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	RE: Support logical replication of DDLs
Date:	2022-12-23 06:03:22
Message-ID:	TYCPR01MB8373612CC162919AF786D07EEDE99@TYCPR01MB8373.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Thursday, December 22, 2022 2:22 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> I have handled most of the comments for [1] in the v55 version patch attached.
> I will handle the pending comments in the upcoming version and reply to it.
> [1] -
> https://www.postgresql.org/message-id/20221207122041.hbfj4hen3ibhdzgn%
> 40alvherre.pgsql
Hi, Vignesh

FYI, cfbot causes a failure for v55 in [1].
Could you check it ?

[1] - https://cirrus-ci.com/task/6366800263249920

Best Regards,
Takamichi Osumi

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	"Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-12-27 06:13:53
Message-ID:	CALDaNm0uXh=RgGD=oB1p83GONb5=L2n3nbpiLGVaMd57TimdZA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, 23 Dec 2022 at 11:33, Takamichi Osumi (Fujitsu)
<osumi(dot)takamichi(at)fujitsu(dot)com> wrote:
>
> On Thursday, December 22, 2022 2:22 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > I have handled most of the comments for [1] in the v55 version patch attached.
> > I will handle the pending comments in the upcoming version and reply to it.
> > [1] -
> > https://www.postgresql.org/message-id/20221207122041.hbfj4hen3ibhdzgn%
> > 40alvherre.pgsql
> Hi, Vignesh
>
>
> FYI, cfbot causes a failure for v55 in [1].
> Could you check it ?

Thanks for letting me know, the updated patch has the changes for the same.
I have changed the patch ordering slightly as deparsing can be tested
using ddl_deparse_to_json and ddl_deparse_expand_command after 0002
changes. I have also moved reorderbuffer and decode changes to 0004
patch.

Regards,
Vignesh

Attachment	Content-Type	Size
v56-0001-Infrastructure-to-support-DDL-deparsing.patch	text/x-patch	32.7 KB
v56-0003-Introduce-the-test_ddl_deparse_regress-test-modu.patch	text/x-patch	43.9 KB
v56-0004-DDL-messaging-infrastructure-for-DDL-replication.patch	text/x-patch	41.5 KB
v56-0002-Functions-to-deparse-DDL-commands.patch	text/x-patch	300.1 KB
v56-0005-Support-DDL-replication.patch	text/x-patch	158.1 KB

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	"Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-12-27 12:03:38
Message-ID:	CALDaNm1n2xEcYTX-kxjEwFmuExc9_ZLmN5GB+t_THOGkUxTmhA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Tue, 27 Dec 2022 at 11:43, vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Fri, 23 Dec 2022 at 11:33, Takamichi Osumi (Fujitsu)
> <osumi(dot)takamichi(at)fujitsu(dot)com> wrote:
> >
> > On Thursday, December 22, 2022 2:22 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > > I have handled most of the comments for [1] in the v55 version patch attached.
> > > I will handle the pending comments in the upcoming version and reply to it.
> > > [1] -
> > > https://www.postgresql.org/message-id/20221207122041.hbfj4hen3ibhdzgn%
> > > 40alvherre.pgsql
> > Hi, Vignesh
> >
> >
> > FYI, cfbot causes a failure for v55 in [1].
> > Could you check it ?
>
> Thanks for letting me know, the updated patch has the changes for the same.
> I have changed the patch ordering slightly as deparsing can be tested
> using ddl_deparse_to_json and ddl_deparse_expand_command after 0002
> changes. I have also moved reorderbuffer and decode changes to 0004
> patch.

I had missed including one of the files in meson.build because of
which cfbot was failing, update version has the changes for the same.

Regards,
Vignesh

Attachment	Content-Type	Size
v56-0004-DDL-messaging-infrastructure-for-DDL-replication.patch	text/x-patch	41.5 KB
v56-0003-Introduce-the-test_ddl_deparse_regress-test-modu.patch	text/x-patch	43.9 KB
v56-0001-Infrastructure-to-support-DDL-deparsing.patch	text/x-patch	32.7 KB
v56-0002-Functions-to-deparse-DDL-commands.patch	text/x-patch	300.1 KB
v56-0005-Support-DDL-replication.patch	text/x-patch	158.5 KB

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	li jie <ggysxcq(at)gmail(dot)com>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-12-27 22:32:59
Message-ID:	CAAD30UL3PPtaEOJ+PY9b_UKLWSVyLsAwihZPHqBF1z_JLA3nvg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Please find the attached patch set which addresses the following comments.

On Mon, Dec 19, 2022 at 5:02 AM li jie <ggysxcq(at)gmail(dot)com> wrote:
>
> I have presented some comments below:
>
> 1. AT_AddColumn
>
> > + tmpobj = new_objtree_VA("ADD %{objtype}s %{definition}s", 3,
> [ IF NOT EXISTS ] is missing here.
Fixed.

> 2. AT_DropColumn
> > + tmpobj = new_objtree_VA("DROP %{objtype}s %{column}I", 3,
> [ IF EXISTS ] is missing here.
Fixed.

> 3. AT_DropConstraint
> > + tmpobj = new_objtree_VA("DROP CONSTRAINT %{constraint}I", 2,
> [ IF EXISTS ] is missing here.
Fixed.

> 4. AT_DetachPartition
> > + tmpobj = new_objtree_VA("DETACH PARTITION %{partition_identity}D", 2,
> [ CONCURRENTLY | FINALIZE ] is missing here.
Fixed.

> 5. deparse_CreateSeqStmt
> > + ret = new_objtree_VA("CREATE %{persistence}s SEQUENCE %{identity}D %{definition: }s", 3,
> [ IF NOT EXISTS ] is missing here.
Fixed.

> 6. deparse_IndexStmt
> > + ret = new_objtree_VA("CREATE %{unique}s INDEX %{concurrently}s %{if_not_exists}s %{name}I ON %{table}D USING %{index_am}s (%{definition}s)", 7,
> [ INCLUDE ] and [ ONLY ] are missing here.
Fixed and added a test case in
src/test/modules/test_ddl_deparse_regress/sql/create_table.sql.

> 7. deparse_RuleStmt
> > + foreach(cell, actions)
> > + list = lappend(list, new_string_object(lfirst(cell)));
>
> if (actions == NIL)
> list = lappend(list, new_string_object("NOTHING"));
> else
> {
> foreach(cell, actions)
> list = lappend(list, new_string_object(lfirst(cell)));
> }
Fixed.

> 8. AT_AddIndexConstraint
> > + tmpobj = new_objtree_VA("ADD CONSTRAINT %{name}I %{constraint_type}s USING INDEX %{index_name}I %{deferrable}s %{init_deferred}s", 6,
> > + "type", ObjTypeString, "add constraint using index",
> > + "name", ObjTypeString, get_constraint_name(constrOid),
> > + "constraint_type", ObjTypeString,
> > + istmt->deferrable ? "DEFERRABLE" : "NOT DEFERRABLE",
>
> "constraint_type", ObjTypeString,
> istmt->primary ? "PRIMARY KEY" : "UNIQUE",
Fixed.

Regards,
Zane

Attachment	Content-Type	Size
v57-0003-Introduce-the-test_ddl_deparse_regress-test-module.patch	application/octet-stream	44.8 KB
v57-0002-Functions-to-deparse-DDL-commands.patch	application/octet-stream	301.1 KB
v57-0004-DDL-messaging-infrastructure-for-DDL-replication.patch	application/octet-stream	41.5 KB
v57-0005-Support-DDL-replication.patch	application/octet-stream	158.6 KB
v57-0001-Infrastructure-to-support-DDL-deparsing.patch	application/octet-stream	32.7 KB

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	li jie <ggysxcq(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-12-28 22:42:43
Message-ID:	CAAD30ULrZB-RNmuD3NMr1jGNUt15ZpPgFdFRX53HbcAB76hefw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

>- CreatePublication has a long list of command tags; is that good?
>Maybe it'd be better to annotate the list in cmdtaglist.h somehow.

I've addressed this comment by introducing a new flag ddlreplok in the
PG_CMDTAG macro and modified CreatePublication accordingly.

Regards,
Zane

Attachment	Content-Type	Size
v58-0004-DDL-messaging-infrastructure-for-DDL-replication.patch	application/octet-stream	41.5 KB
v58-0003-Introduce-the-test_ddl_deparse_regress-test-module.patch	application/octet-stream	44.8 KB
v58-0005-Support-DDL-replication.patch	application/octet-stream	187.6 KB
v58-0001-Infrastructure-to-support-DDL-deparsing.patch	application/octet-stream	32.6 KB
v58-0002-Functions-to-deparse-DDL-commands.patch	application/octet-stream	301.1 KB

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	li jie <ggysxcq(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2022-12-28 22:50:48
Message-ID:	CAAD30UL0EmSGk58eZxxYzFTQ5EqSpX6d+OcX5MuE0j0r2ZgyDA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Wed, Dec 28, 2022 at 5:42 PM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
>
> >- CreatePublication has a long list of command tags; is that good?
> >Maybe it'd be better to annotate the list in cmdtaglist.h somehow.
>
> I've addressed this comment by introducing a new flag ddlreplok in the
> PG_CMDTAG macro and modified CreatePublication accordingly.

I notice the support for the following commands are missing while
moving the command tags, will look into it:
CMDTAG_CREATE_AGGREGATE
CMDTAG_ALTER_AGGREGATE
CMDTAG_DROP_AGGREGATE
CMDTAG_DROP_TRIGGER
CMDTAG_DROP_USER_MAPPING

Zane

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-01-02 08:06:00
Message-ID:	CALDaNm0-TpyigxOu_=_7x0X4QM5pCXH7ZEkUDEPyJYy6XSz_qg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Modified

> 0002 - ddl_deparse.c and its very close friends. This should not have
> any impact on places such as ruleutils.c, sequence.c, etc. The parts of
> the bugfixes (0001, 0005, 0006) that touch this could should be merged
> here as well; there's no reason to have them as separate patches. Some
> test code should be here also, though it probably doesn't need to aim to
> be complete.
> This one is likely to be very large, but also self-contained.

Modified, I have currently kept the testing of deparse as a separate
patch, we are planning to add more tests to it. We can later merge it
to 0002 if required or keep it as 0003 if it is too large.

> 0003 - ddlmessage.c and friends. I understand that DDL-messaging is
> supporting infrastructure for DDL replication; I think it should be its
> own patch. Probably include its own simple-ish test bits.
> Not a very large patch.

Modified

> 0004 - DDL replication proper, including 0004.
> Probably not a very large patch either, not sure.

Modified

> Some review comments, just skimming:
> - 0002 adds some functions to event_trigger.c, but that doesn't seem to
> be their place. Maybe some new file in src/backend/replication/logical
> would make more sense.

Modified

> - publication_deparse_ddl_command_end has a long strcmp() list; why?
> Maybe change things so that it compares some object type enum instead.

Modified wherever possible

> - CreatePublication has a long list of command tags; is that good?
> Maybe it'd be better to annotate the list in cmdtaglist.h somehow.

> - The change in pg_dump's getPublications needs updated to 16.
Modified

> - Don't "git add" src/bin/pg_waldump/logicalddlmsgdesc.c, just update
> its Makefile and meson.build

Modified

These issues were addressed as part of v55 at [1], v56 at [2] and v58
at [3] posted.

[1] - https://www.postgresql.org/message-id/CALDaNm2V6YL6H4P9ZT95Ua_RDJaeDTUf6V0UDfrz4_vxhM5pMg%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/CALDaNm0uXh%3DRgGD%3DoB1p83GONb5%3DL2n3nbpiLGVaMd57TimdZA%40mail.gmail.com
[3] - https://www.postgresql.org/message-id/CAAD30ULrZB-RNmuD3NMr1jGNUt15ZpPgFdFRX53HbcAB76hefw%40mail.gmail.com

Regards,
Vignesh

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	li jie <ggysxcq(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-01-04 21:35:07
Message-ID:	CAAD30ULAVzCnNOsnHwZWoeWfaYr3yDtM-7TnkMJf9HTrW7HVCQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

> On Wed, Dec 28, 2022 at 5:42 PM Zheng Li <zhengli10(at)gmail(dot)com> wrote:

> I notice the support for the following commands are missing while
> moving the command tags, will look into it:
> CMDTAG_CREATE_AGGREGATE
> CMDTAG_ALTER_AGGREGATE
> CMDTAG_DROP_AGGREGATE
> CMDTAG_DROP_TRIGGER
> CMDTAG_DROP_USER_MAPPING

Added support for the above commands in the attached v59 patch.

Regards,
Zane

Attachment	Content-Type	Size
v59-0005-Support-DDL-replication.patch	application/octet-stream	187.7 KB
v59-0001-Infrastructure-to-support-DDL-deparsing.patch	application/octet-stream	32.6 KB
v59-0003-Introduce-the-test_ddl_deparse_regress-test-module.patch	application/octet-stream	47.4 KB
v59-0004-DDL-messaging-infrastructure-for-DDL-replication.patch	application/octet-stream	41.5 KB
v59-0002-Functions-to-deparse-DDL-commands.patch	application/octet-stream	310.6 KB

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Cc:	li jie <ggysxcq(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-01-07 15:28:11
Message-ID:	CAAD30U+fqaaD6533-eiaWVHpUaBNBCEvqyXOT_ow1B--Aa_jOQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Hi,

I added documentation and changed user interface design in the
attached v60 patch set.
The patch set addressed comments from Peter in [1].

The motivation for the user interface change is that we want to manage
DDL replication feature in stages with fine grained replication
levels.
For example, we can focus on reviewing and testing table commands
first, then other commands. It also make sense to introduce different
DDL replication levels
from the user perspective as pointed out in [1]. We can add more
replication levels along the way.

In this patch DDL replication is disabled by default and it can be
enabled at different levels
using the new PUBLICATION option 'ddl'. This option currently has two
levels and are
only allowed to be set if the PUBLICATION is FOR ALL TABLES.

all: this option enables replication of all supported DDL commands.
table: this option enables replication of Table DDL commands which include:
-CREATE/ALTER/DROP TABLE
-CREATE TABLE AS

Example:
Create a new publication with all ddl replication enabled:
CREATE PUBLICATION pub1 FOR ALL TABLES with (ddl = 'all');

Enable table ddl replication for an existing Publication:
ALTER PUBLICATION pub2 SET (ddl = 'table');

Also I break down ddl_deparse.c into two patches:
1.v60-0002-Functions-to-deparse-Table-DDL-commands.patch
2.v60-0003-Support-DDL-deparse-of-the-rest-commands.patch

Regards,
Zane

[1] https://www.postgresql.org/message-id/CAAD30UKX%3DPbojrjU0webYy7Y9mz1HmDTM3dx_JJXpc%2BdXW-EQQ%40mail.gmail.com

Attachment	Content-Type	Size
v60-0004-Introduce-the-test_ddl_deparse_regress-test-module.patch	application/octet-stream	47.4 KB
v60-0001-Infrastructure-to-support-DDL-deparsing.patch	application/octet-stream	32.6 KB
v60-0002-Functions-to-deparse-Table-DDL-commands.patch	application/octet-stream	130.5 KB
v60-0005-DDL-messaging-infrastructure-for-DDL-replication.patch	application/octet-stream	41.5 KB
v60-0003-Support-DDL-deparse-of-the-rest-commands.patch	application/octet-stream	189.4 KB
v60-0007-Document-DDL-replication-and-DDL-deparser.patch	application/octet-stream	40.6 KB
v60-0006-Support-DDL-replication.patch	application/octet-stream	198.0 KB

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-01-13 06:33:26
Message-ID:	CALDaNm0mnkHXnQ7zMDxY6=ZtWn8A1iKLKGhEtqRVC9c3gc8YCw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Sat, 7 Jan 2023 at 20:58, Zheng Li <zhengli10(at)gmail(dot)com> wrote:
>
> Hi,
>
> I added documentation and changed user interface design in the
> attached v60 patch set.
> The patch set addressed comments from Peter in [1].
>
> The motivation for the user interface change is that we want to manage
> DDL replication feature in stages with fine grained replication
> levels.
> For example, we can focus on reviewing and testing table commands
> first, then other commands. It also make sense to introduce different
> DDL replication levels
> from the user perspective as pointed out in [1]. We can add more
> replication levels along the way.
>
> In this patch DDL replication is disabled by default and it can be
> enabled at different levels
> using the new PUBLICATION option 'ddl'. This option currently has two
> levels and are
> only allowed to be set if the PUBLICATION is FOR ALL TABLES.
>
> all: this option enables replication of all supported DDL commands.
> table: this option enables replication of Table DDL commands which include:
> -CREATE/ALTER/DROP TABLE
> -CREATE TABLE AS
>
> Example:
> Create a new publication with all ddl replication enabled:
> CREATE PUBLICATION pub1 FOR ALL TABLES with (ddl = 'all');
>
> Enable table ddl replication for an existing Publication:
> ALTER PUBLICATION pub2 SET (ddl = 'table');
>
> Also I break down ddl_deparse.c into two patches:
> 1.v60-0002-Functions-to-deparse-Table-DDL-commands.patch
> 2.v60-0003-Support-DDL-deparse-of-the-rest-commands.patch

Adding support for CREATE/ALTER/DROP Publication ddl deparsing.
The attached v61 patch has the changes for the same.

Regards,
Vignesh

Attachment	Content-Type	Size
v61-0001-Infrastructure-to-support-DDL-deparsing.patch	text/x-patch	32.6 KB
v61-0002-Functions-to-deparse-Table-DDL-commands.patch	text/x-patch	130.9 KB
v61-0005-DDL-messaging-infrastructure-for-DDL-replication.patch	text/x-patch	41.5 KB
v61-0004-Introduce-the-test_ddl_deparse_regress-test-modu.patch	text/x-patch	47.4 KB
v61-0003-Support-DDL-deparse-of-the-rest-commands.patch	text/x-patch	196.1 KB
v61-0006-Support-DDL-replication.patch	text/x-patch	210.9 KB
v61-0007-Document-DDL-replication-and-DDL-deparser.patch	text/x-patch	40.6 KB

From:	Ajin Cherian <itsajin(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-01-15 04:09:25
Message-ID:	CAFPTHDY_9_xd8JEckB=Au2ALPnx+hfmn9ztutVw4aVwZxLhrNQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, Jan 13, 2023 at 5:33 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> Adding support for CREATE/ALTER/DROP Publication ddl deparsing.
> The attached v61 patch has the changes for the same.
>

Hi Vignesh,
this doesn't seem to compile:

gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Werror=vla -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -fexcess-precision=standard -g -ggdb -Og -g3
-fno-omit-frame-pointer -I../../../src/include -D_GNU_SOURCE -c -o
ddl_deparse.o ddl_deparse.c
ddl_deparse.c: In function ‘deparse_PublicationObjects’:
ddl_deparse.c:8956:3: error: unknown type name ‘publication_rel’
publication_rel *pub_rel = (publication_rel *) lfirst(lc1);
^
ddl_deparse.c:8956:31: error: ‘publication_rel’ undeclared (first use
in this function)
publication_rel *pub_rel = (publication_rel *) lfirst(lc1);
^
ddl_deparse.c:8956:31: note: each undeclared identifier is reported
only once for each function it appears in
ddl_deparse.c:8956:48: error: expected expression before ‘)’ token
publication_rel *pub_rel = (publication_rel *) lfirst(lc1);

regards,
Ajin Cherian
Fujitsu Australia

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Ajin Cherian <itsajin(at)gmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-01-16 03:12:52
Message-ID:	CALDaNm3YEFNwFkkbzS1J8NiHKX6LWQ_h3CA+CPtkUqAZgD8q+Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Sun, 15 Jan 2023 at 09:39, Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
> On Fri, Jan 13, 2023 at 5:33 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > Adding support for CREATE/ALTER/DROP Publication ddl deparsing.
> > The attached v61 patch has the changes for the same.
> >
>
> Hi Vignesh,
> this doesn't seem to compile:
>
> gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith
> -Wdeclaration-after-statement -Werror=vla -Wendif-labels
> -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
> -fwrapv -fexcess-precision=standard -g -ggdb -Og -g3
> -fno-omit-frame-pointer -I../../../src/include -D_GNU_SOURCE -c -o
> ddl_deparse.o ddl_deparse.c
> ddl_deparse.c: In function ‘deparse_PublicationObjects’:
> ddl_deparse.c:8956:3: error: unknown type name ‘publication_rel’
> publication_rel *pub_rel = (publication_rel *) lfirst(lc1);
> ^
> ddl_deparse.c:8956:31: error: ‘publication_rel’ undeclared (first use
> in this function)
> publication_rel *pub_rel = (publication_rel *) lfirst(lc1);
> ^
> ddl_deparse.c:8956:31: note: each undeclared identifier is reported
> only once for each function it appears in
> ddl_deparse.c:8956:48: error: expected expression before ‘)’ token
> publication_rel *pub_rel = (publication_rel *) lfirst(lc1);

I was able to apply all the patches and compile successfully, also
cfbot as in [1] has compiled successfully. One thing I noticed is that
structure is available at 006 patch, if you did not apply all the
patch it might fail, we will take care of structure movement in the
next version.
[1] - https://cirrus-ci.com/task/5231672585617408

Regards,
Vignesh

From:	Ajin Cherian <itsajin(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-01-16 13:33:49
Message-ID:	CAFPTHDY_4koZ135_Bub_=HPBxpG1u+XBEv8zspwXW7rAQps6eA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Mon, Jan 16, 2023 at 2:13 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Sun, 15 Jan 2023 at 09:39, Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
> >
> > On Fri, Jan 13, 2023 at 5:33 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > > Adding support for CREATE/ALTER/DROP Publication ddl deparsing.
> > > The attached v61 patch has the changes for the same.
> > >
> >
> > Hi Vignesh,
> > this doesn't seem to compile:
> >
> > gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith
> > -Wdeclaration-after-statement -Werror=vla -Wendif-labels
> > -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
> > -fwrapv -fexcess-precision=standard -g -ggdb -Og -g3
> > -fno-omit-frame-pointer -I../../../src/include -D_GNU_SOURCE -c -o
> > ddl_deparse.o ddl_deparse.c
> > ddl_deparse.c: In function ‘deparse_PublicationObjects’:
> > ddl_deparse.c:8956:3: error: unknown type name ‘publication_rel’
> > publication_rel *pub_rel = (publication_rel *) lfirst(lc1);
> > ^
> > ddl_deparse.c:8956:31: error: ‘publication_rel’ undeclared (first use
> > in this function)
> > publication_rel *pub_rel = (publication_rel *) lfirst(lc1);
> > ^
> > ddl_deparse.c:8956:31: note: each undeclared identifier is reported
> > only once for each function it appears in
> > ddl_deparse.c:8956:48: error: expected expression before ‘)’ token
> > publication_rel *pub_rel = (publication_rel *) lfirst(lc1);
>
> I was able to apply all the patches and compile successfully, also
> cfbot as in [1] has compiled successfully. One thing I noticed is that
> structure is available at 006 patch, if you did not apply all the
> patch it might fail, we will take care of structure movement in the
> next version.
> [1] - https://cirrus-ci.com/task/5231672585617408
>

Fixing an earlier comment from Peter:
On Mon, Oct 31, 2022 at 7:07 PM Peter Smith
<smithpb2250(at)gmail(dot)com> wrote:
>
>0. append_object_to_format_string
>Instead of this little loop why doesn't the code just look for the
?name delimiters?
>
>e.g.
>pstart = strch(sub_fmt, '{');
>pend = strbrk(pstart, ":}");
>
>then the 'name' is what lies in between...

made the logic simpler with strchr.

Also:
1. fixed a compilation warning seen in publicationcmds.c
2. fixed a broken documentation build
3. changed a failed build of patch 3 due to missing header.

regards,
Ajin Cherian
Fujitsu Australia

Attachment	Content-Type	Size
v62-0002-Functions-to-deparse-Table-DDL-commands.patch	application/octet-stream	130.9 KB
v62-0001-Infrastructure-to-support-DDL-deparsing.patch	application/octet-stream	32.5 KB
v62-0005-DDL-messaging-infrastructure-for-DDL-replication.patch	application/octet-stream	41.2 KB
v62-0004-Introduce-the-test_ddl_deparse_regress-test-modu.patch	application/octet-stream	47.4 KB
v62-0003-Support-DDL-deparse-of-the-rest-commands.patch	application/octet-stream	197.8 KB
v62-0006-Support-DDL-replication.patch	application/octet-stream	209.2 KB
v62-0007-Document-DDL-replication-and-DDL-deparser.patch	application/octet-stream	40.6 KB

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-01-18 11:26:47
Message-ID:	CAA4eK1+r9RK3npxBv73nExU290fcYMPGYZGBWG1R=-msYG8j=A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Sat, Jan 7, 2023 at 8:58 PM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
>
> I added documentation and changed user interface design in the
> attached v60 patch set.
> The patch set addressed comments from Peter in [1].
>
> The motivation for the user interface change is that we want to manage
> DDL replication feature in stages with fine grained replication
> levels.
> For example, we can focus on reviewing and testing table commands
> first, then other commands. It also make sense to introduce different
> DDL replication levels
> from the user perspective as pointed out in [1]. We can add more
> replication levels along the way.
>
> In this patch DDL replication is disabled by default and it can be
> enabled at different levels
> using the new PUBLICATION option 'ddl'. This option currently has two
> levels and are
> only allowed to be set if the PUBLICATION is FOR ALL TABLES.
>
> all: this option enables replication of all supported DDL commands.
> table: this option enables replication of Table DDL commands which include:
> -CREATE/ALTER/DROP TABLE
> -CREATE TABLE AS
>

I think this point needs some thought. When you say 'all', how do you
think it will help to support DDL replication for foreign tables,
materialized views, views, etc where changes to such relations are
currently not supported by logical replication? We should also think
about initial sync for all those objects as well.

--
With Regards,
Amit Kapila.

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-01-19 03:09:02
Message-ID:	CAAD30UL-QJvDzEJQB-4hJ7RdbPjnCA3JMO7MdOL5GAH9niWBnQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Wed, Jan 18, 2023 at 6:27 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Sat, Jan 7, 2023 at 8:58 PM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
> >
> > I added documentation and changed user interface design in the
> > attached v60 patch set.
> > The patch set addressed comments from Peter in [1].
> >
> > The motivation for the user interface change is that we want to manage
> > DDL replication feature in stages with fine grained replication
> > levels.
> > For example, we can focus on reviewing and testing table commands
> > first, then other commands. It also make sense to introduce different
> > DDL replication levels
> > from the user perspective as pointed out in [1]. We can add more
> > replication levels along the way.
> >
> > In this patch DDL replication is disabled by default and it can be
> > enabled at different levels
> > using the new PUBLICATION option 'ddl'. This option currently has two
> > levels and are
> > only allowed to be set if the PUBLICATION is FOR ALL TABLES.
> >
> > all: this option enables replication of all supported DDL commands.
> > table: this option enables replication of Table DDL commands which include:
> > -CREATE/ALTER/DROP TABLE
> > -CREATE TABLE AS
> >
>
> I think this point needs some thought. When you say 'all', how do you
> think it will help to support DDL replication for foreign tables,
> materialized views, views, etc where changes to such relations are
> currently not supported by logical replication?

I think DDL replication naturally provides support for views and
materialized views,
if the publication is FOR ALL TABLES since all the tables in the
view/MV definition
are replicated.

Foreign Tables can also be considered replicated with DDL replication because we
don't even need to replicate the data as it resides on the external
server. Users
need to configure the external server to allow connection from the
subscriber for
foreign tables to work on the subscriber.

> We should also think
> about initial sync for all those objects as well.

Agree, we're starting an investigation on initial sync. But I think
initial sync depends on
DDL replication to work reliably, not the other way around. DDL replication can
work on its own without the initial sync of schema, users just need to
setup the initial
schema just like they would today.

Regards,
Znae

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-01-19 07:05:38
Message-ID:	CAA4eK1+h-+cxqTXp0YDsErRpoNiaP8Hu4Se7DYGymuh8j9fxHQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Thu, Jan 19, 2023 at 8:39 AM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
>
> On Wed, Jan 18, 2023 at 6:27 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Sat, Jan 7, 2023 at 8:58 PM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
> > >
>
> Foreign Tables can also be considered replicated with DDL replication because we
> don't even need to replicate the data as it resides on the external
> server. Users
> need to configure the external server to allow connection from the
> subscriber for
> foreign tables to work on the subscriber.
>

So, this would mean that we expect the subscriber will also have the
same foreign server as the publisher because we will replicate the
entire connection/user information of the foreign server for the
publisher. But what about data inserted by the publisher on the
foreign server?

> > We should also think
> > about initial sync for all those objects as well.
>
> Agree, we're starting an investigation on initial sync. But I think
> initial sync depends on
> DDL replication to work reliably, not the other way around. DDL replication can
> work on its own without the initial sync of schema, users just need to
> setup the initial
> schema just like they would today.
>

The difference is that today users need to take care of all schema
setup on both and follow changes in the same on the publisher. But
with DDL replication, there has to be a point prior to which both the
nodes have the same setup. For that, before setting up DDL
replication, users need to ensure that both nodes have the same
schema, and then during setup, the user doesn't perform any DDL on the
publisher.

--
With Regards,
Amit Kapila.

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Ajin Cherian <itsajin(at)gmail(dot)com>
Cc:	Zheng Li <zhengli10(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-01-19 08:51:01
Message-ID:	CALDaNm0dOwbPVrkqw9OLN=TVFdo5aroUNwdjG5SAcKQg93-g0w@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Mon, 16 Jan 2023 at 19:04, Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
> On Mon, Jan 16, 2023 at 2:13 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > On Sun, 15 Jan 2023 at 09:39, Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
> > >
> > > On Fri, Jan 13, 2023 at 5:33 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > > > Adding support for CREATE/ALTER/DROP Publication ddl deparsing.
> > > > The attached v61 patch has the changes for the same.
> > > >
> > >
> > > Hi Vignesh,
> > > this doesn't seem to compile:
> > >
> > > gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith
> > > -Wdeclaration-after-statement -Werror=vla -Wendif-labels
> > > -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
> > > -fwrapv -fexcess-precision=standard -g -ggdb -Og -g3
> > > -fno-omit-frame-pointer -I../../../src/include -D_GNU_SOURCE -c -o
> > > ddl_deparse.o ddl_deparse.c
> > > ddl_deparse.c: In function ‘deparse_PublicationObjects’:
> > > ddl_deparse.c:8956:3: error: unknown type name ‘publication_rel’
> > > publication_rel *pub_rel = (publication_rel *) lfirst(lc1);
> > > ^
> > > ddl_deparse.c:8956:31: error: ‘publication_rel’ undeclared (first use
> > > in this function)
> > > publication_rel *pub_rel = (publication_rel *) lfirst(lc1);
> > > ^
> > > ddl_deparse.c:8956:31: note: each undeclared identifier is reported
> > > only once for each function it appears in
> > > ddl_deparse.c:8956:48: error: expected expression before ‘)’ token
> > > publication_rel *pub_rel = (publication_rel *) lfirst(lc1);
> >
> > I was able to apply all the patches and compile successfully, also
> > cfbot as in [1] has compiled successfully. One thing I noticed is that
> > structure is available at 006 patch, if you did not apply all the
> > patch it might fail, we will take care of structure movement in the
> > next version.
> > [1] - https://cirrus-ci.com/task/5231672585617408
> >
>
> Fixing an earlier comment from Peter:
> On Mon, Oct 31, 2022 at 7:07 PM Peter Smith
> <smithpb2250(at)gmail(dot)com> wrote:
> >
> >0. append_object_to_format_string
> >Instead of this little loop why doesn't the code just look for the
> ?name delimiters?
> >
> >e.g.
> >pstart = strch(sub_fmt, '{');
> >pend = strbrk(pstart, ":}");
> >
> >then the 'name' is what lies in between...
>
> made the logic simpler with strchr.
>
> Also:
> 1. fixed a compilation warning seen in publicationcmds.c
> 2. fixed a broken documentation build
> 3. changed a failed build of patch 3 due to missing header.

Hi,

I have fixed the following issues in the v63 version patch attached:
1) pg_dump was dumping the internally generated event triggers and the
create publication with ddl option which resulted in creating multiple
event triggers while restoring, this patch now addresses this by
skipping the dump of internally generated event triggers
2) Restoring of non ddl publication was not correct, this is handled
3) There were few indent issues while applying 003 patch.

These are handled in the v63 version attached.

Regards,
Vignesh

Attachment	Content-Type	Size
v63-0001-Infrastructure-to-support-DDL-deparsing.patch	text/x-patch	37.5 KB
v63-0005-DDL-messaging-infrastructure-for-DDL-replication.patch	text/x-patch	41.5 KB
v63-0004-Introduce-the-test_ddl_deparse_regress-test-modu.patch	text/x-patch	47.4 KB
v63-0003-Support-DDL-deparse-of-the-rest-commands.patch	text/x-patch	198.7 KB
v63-0002-Functions-to-deparse-Table-DDL-commands.patch	text/x-patch	131.0 KB
v63-0007-Document-DDL-replication-and-DDL-deparser.patch	text/x-patch	40.6 KB
v63-0006-Support-DDL-replication.patch	text/x-patch	210.2 KB

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-01-19 17:54:38
Message-ID:	CAAD30UJ9cF+xURQH7UQkYK5Xsz5Ryap6gSo6a4T9amkEn_jSiw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Thu, Jan 19, 2023 at 2:05 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, Jan 19, 2023 at 8:39 AM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
> >
> > On Wed, Jan 18, 2023 at 6:27 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Sat, Jan 7, 2023 at 8:58 PM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
> > > >
> >
> > Foreign Tables can also be considered replicated with DDL replication because we
> > don't even need to replicate the data as it resides on the external
> > server. Users
> > need to configure the external server to allow connection from the
> > subscriber for
> > foreign tables to work on the subscriber.
> >
>
> So, this would mean that we expect the subscriber will also have the
> same foreign server as the publisher because we will replicate the
> entire connection/user information of the foreign server for the
> publisher.

Yes, CREATE/ALTER SERVER commands are also supported by the current
DDL replication patch.

>But what about data inserted by the publisher on the
> foreign server?

I thought the data inserted to a foreign table will always be stored
on the foreign server unless I'm mistaken?

> > > We should also think
> > > about initial sync for all those objects as well.
> >
> > Agree, we're starting an investigation on initial sync. But I think
> > initial sync depends on
> > DDL replication to work reliably, not the other way around. DDL replication can
> > work on its own without the initial sync of schema, users just need to
> > setup the initial
> > schema just like they would today.
> >
>
> The difference is that today users need to take care of all schema
> setup on both and follow changes in the same on the publisher. But
> with DDL replication, there has to be a point prior to which both the
> nodes have the same setup. For that, before setting up DDL
> replication, users need to ensure that both nodes have the same
> schema, and then during setup, the user doesn't perform any DDL on the
> publisher.

The users can perform DDL during the setup if they do the following:
1. Create a logical replication slot to capture changes on the publisher
2. Do a backup for the publisher
3. Restore the backup as the subscriber
4. Advance the logical slot to the last valid LSN of the restore
5. Create pub/sub and use the above logical slot.

Regards,
Zane

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Zheng Li <zhengli10(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-01-23 08:46:52
Message-ID:	CAA4eK1KwV0BSoSa3PXpn=Z9YTZ2ZurZU4FDQ1S5tkKTBwHKRRw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Thu, Jan 19, 2023 at 11:24 PM Zheng Li <zhengli10(at)gmail(dot)com> wrote:
>
> On Thu, Jan 19, 2023 at 2:05 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > >
> > > Foreign Tables can also be considered replicated with DDL replication because we
> > > don't even need to replicate the data as it resides on the external
> > > server. Users
> > > need to configure the external server to allow connection from the
> > > subscriber for
> > > foreign tables to work on the subscriber.
> > >
> >
> > So, this would mean that we expect the subscriber will also have the
> > same foreign server as the publisher because we will replicate the
> > entire connection/user information of the foreign server for the
> > publisher.
>
> Yes, CREATE/ALTER SERVER commands are also supported by the current
> DDL replication patch.
>
> >But what about data inserted by the publisher on the
> > foreign server?
>
> I thought the data inserted to a foreign table will always be stored
> on the foreign server unless I'm mistaken?
>

I also have the same understanding. It is not clear to me if there is
a use case to just allow the foreign server set up without caring for
data replication. So, what this will achieve is both publisher and
subscriber will be allowed to perform operations on the same foreign
server but not sure if that is expected by the user and is useful to
them.

> > > > We should also think
> > > > about initial sync for all those objects as well.
> > >
> > > Agree, we're starting an investigation on initial sync. But I think
> > > initial sync depends on
> > > DDL replication to work reliably, not the other way around. DDL replication can
> > > work on its own without the initial sync of schema, users just need to
> > > setup the initial
> > > schema just like they would today.
> > >
> >
> > The difference is that today users need to take care of all schema
> > setup on both and follow changes in the same on the publisher. But
> > with DDL replication, there has to be a point prior to which both the
> > nodes have the same setup. For that, before setting up DDL
> > replication, users need to ensure that both nodes have the same
> > schema, and then during setup, the user doesn't perform any DDL on the
> > publisher.
>
> The users can perform DDL during the setup if they do the following:
> 1. Create a logical replication slot to capture changes on the publisher
> 2. Do a backup for the publisher
> 3. Restore the backup as the subscriber
> 4. Advance the logical slot to the last valid LSN of the restore
> 5. Create pub/sub and use the above logical slot.
>

Are you talking about basebackup/restore or pg_dump? I have later in
mind (with the snapshot option) to achieve it. However, I think it
won't be convenient for users to do those steps by themselves as there
is a risk of mistakes leading to wrong set up.

--
With Regards,
Amit Kapila.

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-01-24 03:27:50
Message-ID:	CAAD30UJaP-eEChY-i0uEOn3sNZpqSD7Ma-abDy4s+JvxG=QdHw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

> > Yes, CREATE/ALTER SERVER commands are also supported by the current
> > DDL replication patch.
> >
> > >But what about data inserted by the publisher on the
> > > foreign server?
> >
> > I thought the data inserted to a foreign table will always be stored
> > on the foreign server unless I'm mistaken?
> >
>
> I also have the same understanding. It is not clear to me if there is
> a use case to just allow the foreign server set up without caring for
> data replication. So, what this will achieve is both publisher and
> subscriber will be allowed to perform operations on the same foreign
> server but not sure if that is expected by the user and is useful to
> them.
One use case I think of is using a logical replica for online major
version upgrade, users would want the foreign server on the
subscriber/replica set up identical to the publisher.
There may be cases when foreign server replication is not needed,
which justifies the motivation to define fine grained DDL replication
levels.

>
> > > > > We should also think
> > > > > about initial sync for all those objects as well.
> > > >
> > > > Agree, we're starting an investigation on initial sync. But I think
> > > > initial sync depends on
> > > > DDL replication to work reliably, not the other way around. DDL replication can
> > > > work on its own without the initial sync of schema, users just need to
> > > > setup the initial
> > > > schema just like they would today.
> > > >
> > >
> > > The difference is that today users need to take care of all schema
> > > setup on both and follow changes in the same on the publisher. But
> > > with DDL replication, there has to be a point prior to which both the
> > > nodes have the same setup. For that, before setting up DDL
> > > replication, users need to ensure that both nodes have the same
> > > schema, and then during setup, the user doesn't perform any DDL on the
> > > publisher.
> >
> > The users can perform DDL during the setup if they do the following:
> > 1. Create a logical replication slot to capture changes on the publisher
> > 2. Do a backup for the publisher
> > 3. Restore the backup as the subscriber
> > 4. Advance the logical slot to the last valid LSN of the restore
> > 5. Create pub/sub and use the above logical slot.
> >
>
> Are you talking about basebackup/restore or pg_dump? I have later in
> mind (with the snapshot option) to achieve it. However, I think it
> won't be convenient for users to do those steps by themselves as there
> is a risk of mistakes leading to wrong set up.

I'm talking about basebackup where the last valid LSN can be
identified from the Postgres logs. I don't think we can easily
identify the LSN to advance the slot to with pg_dump/pg_restore since
it's a logical copy.
I agree these steps are not straightforward, but some customers are
doing it themselves.

Regards,
Zane

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-02-03 00:41:12
Message-ID:	CAHut+PtH9S+3-TGPa4_uipBP-4-Kksx-FrEkg9ZtK0FphVRA5A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Here are some review comments for patch v63-0001.

======
General

1.
(This is not really a review comment - more just an observation...)

This patch seemed mostly like an assortment of random changes that
don't seem to have anything in common except that some *later* patches
of this set are apparently going to want them.

Now maybe doing it this way was the best and neatest thing to do --
I'm not sure. But my first impression was I felt this has gone too far
in some places -- e.g. perhaps some of these changes would have been
better deferred until they are *really* needed instead of just
plonking a whole lot of un-called (i.e. untested) code into patch
0001.

======
Commit message

2.
2) Some of the prototype and structures were moved from pg_publication.h
to publicationcmds.h as one of the later patch requires inclusion of
pg_publication.h and these prototype had references to server header
files.

SUGGESTION (?)
2) Some prototypes and structures were moved from pg_publication.h to
publicationcmds.h. This was because one of the later patches required
the inclusion of pg_publication.h and these prototypes had references
to server header files.

======
src/backend/catalog/aclchk.c

3. ExecuteGrantStmt

+ /* Copy the grantor id needed for DDL deparsing of Grant */
+ istmt.grantor_uid = grantor;
+

SUGGESTION (comment)
Copy the grantor id to the parsetree, needed for DDL deparsing of Grant

======
src/backend/catalog/objectaddress.c

4. getObjectIdentityParts

@@ -5922,7 +5922,7 @@ getObjectIdentityParts(const ObjectAddress *object,
transformType = format_type_be_qualified(transform->trftype);
transformLang = get_language_name(transform->trflang, false);

- appendStringInfo(&buffer, "for %s on language %s",
+ appendStringInfo(&buffer, "for %s language %s",
transformType,
transformLang);

There is no clue anywhere what this change was for.

Perhaps this ought to be mentioned in the Commit Message.

Alternatively, maybe defer this change until it becomes clearer who needs it?

======
src/backend/commands/collationcmds.c

5.
+ /*
+ * Make from existing collationid available to callers for statement such as
+ * CREATE COLLATION any_name FROM any_name
+ */
+ if (from_existing_collid && OidIsValid(collid))
+ ObjectAddressSet(*from_existing_collid, CollationRelationId, collid);

"for statement such as" --> "for statements such as"

======
src/backend/commands/event_trigger.c

6.
+EventTriggerQueryState *currentEventTriggerState = NULL;

It seems overkill to make this non-static here. I didn't find anybody
using this variable from outside this source, so unless this was a
mistake I guess it's preparing the ground for some future patch.
Either way, it didn't seem like this belonged in patch 0001.

======
src/backend/commands/sequence.c

7.
+Form_pg_sequence_data
+get_sequence_values(Oid sequenceId)
+{
+ Buffer buf;
+ SeqTable elm;
+ Relation seqrel;
+ HeapTupleData seqtuple;
+ Form_pg_sequence_data seq;
+ Form_pg_sequence_data retSeq;
+
+ /* Open and AccessShareLock sequence */
+ init_sequence(sequenceId, &elm, &seqrel);
+
+ if (pg_class_aclcheck(sequenceId, GetUserId(),
+ ACL_SELECT | ACL_UPDATE | ACL_USAGE) != ACLCHECK_OK)
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("permission denied for sequence %s",
+ RelationGetRelationName(seqrel))));
+
+ seq = read_seq_tuple(seqrel, &buf, &seqtuple);
+ retSeq = palloc(sizeof(FormData_pg_sequence_data));
+
+ memcpy(retSeq, seq, sizeof(FormData_pg_sequence_data));
+
+ UnlockReleaseBuffer(buf);
+ relation_close(seqrel, NoLock);
+
+ return retSeq;
+}

IMO the palloc might be better done up-front when the retSeq was declared.

======
src/backend/tcop/utility.c

8.
+/*
+ * Return the given object type as a string.
+ */
+const char *
+stringify_objtype(ObjectType objtype, bool isgrant)
+{
+ switch (objtype)
+ {
+ case OBJECT_AGGREGATE:
+ return "AGGREGATE";
+ case OBJECT_CAST:
+ return "CAST";
+ case OBJECT_COLLATION:
+ return "COLLATION";
+ case OBJECT_COLUMN:
+ return isgrant ? "TABLE" : "COLUMN";
+ case OBJECT_CONVERSION:
+ return "CONVERSION";
+ case OBJECT_DATABASE:
+ return "DATABASE";
+ case OBJECT_DOMAIN:
+ return "DOMAIN";
+ case OBJECT_EVENT_TRIGGER:
+ return "EVENT TRIGGER";
+ case OBJECT_EXTENSION:
+ return "EXTENSION";
+ case OBJECT_FDW:
+ return "FOREIGN DATA WRAPPER";
+ case OBJECT_FOREIGN_SERVER:
+ return isgrant ? "FOREIGN SERVER" : "SERVER";
+ case OBJECT_FOREIGN_TABLE:
+ return "FOREIGN TABLE";

That 'is_grant' param seemed a bit hacky.

At least some comment should be given (maybe in the function header?)
to explain why this boolean is modifying the return string.

Or maybe it is better to have another stringify_objtype_for_grant that
just wraps this?

======
src/backend/utils/adt/regproc.c

9.
+
+/*
+ * Append the parenthesized arguments of the given pg_proc row into the output
+ * buffer. force_qualify indicates whether to schema-qualify type names
+ * regardless of visibility.
+ */
+static void
+format_procedure_args_internal(Form_pg_proc procform, StringInfo buf,
+ bool force_qualify)
+{
+ int i;
+ char* (*func[2])(Oid) = {format_type_be, format_type_be_qualified};
+
+ appendStringInfoChar(buf, '(');
+ for (i = 0; i < procform->pronargs; i++)
+ {
+ Oid thisargtype = procform->proargtypes.values[i];
+ char *argtype = NULL;
+
+ if (i > 0)
+ appendStringInfoChar(buf, ',');
+
+ argtype = func[force_qualify](thisargtype);
+ appendStringInfoString(buf, argtype);
+ pfree(argtype);
+ }
+ appendStringInfoChar(buf, ')');
+}

9a.
Assign argtype = NULL looks redundant because it will always be
overwritten anyhow.

9b.
I understand why this function was put here beside the other static
functions in "Support Routines" but IMO it really belongs nearby (i.e.
directly above) the only caller (format_procedure_args). Keeping both
those functional together will improve the readability of both, and
will also remove the need to have the static forward declaration.

======
src/backend/utils/adt/ruleutils.c

10.
+void
+pg_get_ruledef_detailed(Datum ev_qual, Datum ev_action,
+ char **whereClause, List **actions)
+{
+ int prettyFlags = 0;
+ char *qualstr = TextDatumGetCString(ev_qual);
+ char *actionstr = TextDatumGetCString(ev_action);
+ List *actionNodeList = (List *) stringToNode(actionstr);
+ StringInfoData buf;
+
+ *whereClause = NULL;
+ *actions = NIL;
+ initStringInfo(&buf);
+ if (strlen(qualstr) > 0 && strcmp(qualstr, "<>") != 0)
+ {

If you like, that condition could have been written more simply as:

if (*qualstr && strcmp(qualstr, "<>") != 0)

~~~

11.
+/*
+ * Parse back the TriggerWhen clause of a trigger given the
pg_trigger record and
+ * the expression tree (in nodeToString() representation) from
pg_trigger.tgqual
+ * for the trigger's WHEN condition.
+ */
+char *
+pg_get_trigger_whenclause(Form_pg_trigger trigrec, Node *whenClause,
bool pretty)
+{

It seemed "Parse back" is a typo.

I assume it was meant to say something like "Passes back", or maybe
just "Returns" is better.

======
src/include/replication/logicalrelation.h

12.
@@ -14,6 +14,7 @@

#include "access/attmap.h"
#include "replication/logicalproto.h"
+#include "storage/lockdefs.h"

What is this needed here for? I tried without this change and
everything still builds OK.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-02-03 10:21:39
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On 2023-Feb-03, Peter Smith wrote:

> 1.
> (This is not really a review comment - more just an observation...)
>
> This patch seemed mostly like an assortment of random changes that
> don't seem to have anything in common except that some *later* patches
> of this set are apparently going to want them.

That's true, but from a submitter perspective it is 1000x easier to do
it like this, and for a reviewer these changes are not really very
interesting. By now, given the amount of review effort that needs to go
into this patch (just because it's 800kb of diff), it seems fairly clear
that we cannot get this patch in time for v16, so it doesn't seem
priority to get this point sorted out. Personally, from a review point
of view, I would still prefer to have it this way rather than each
change scattered in each individual patch that needs it, so let's not
get too worked out about it at this point. Maybe if we can find some
use for some of these helpers in existing code that allow refactoring
while introducing these new functions, we can add them ahead of
everything else.

> 3. ExecuteGrantStmt
>
> + /* Copy the grantor id needed for DDL deparsing of Grant */
> + istmt.grantor_uid = grantor;
> +
>
> SUGGESTION (comment)
> Copy the grantor id to the parsetree, needed for DDL deparsing of Grant

Is istmt really "the parse tree" actually? As I recall, it's a derived
struct that's created during execution of the grant/revoke command, so
modifying the comment like this would be a mistake.

> @@ -5922,7 +5922,7 @@ getObjectIdentityParts(const ObjectAddress *object,
> transformType = format_type_be_qualified(transform->trftype);
> transformLang = get_language_name(transform->trflang, false);
>
> - appendStringInfo(&buffer, "for %s on language %s",
> + appendStringInfo(&buffer, "for %s language %s",
> transformType,
> transformLang);
>
> There is no clue anywhere what this change was for.

We should get the objectIdentity changes ahead of everything else; I
think these can be qualified as bugs (though I would recommend not
backpatching them.) I think there were two of these.

> 8.
> +/*
> + * Return the given object type as a string.
> + */
> +const char *
> +stringify_objtype(ObjectType objtype, bool isgrant)
> +{

> That 'is_grant' param seemed a bit hacky.
>
> At least some comment should be given (maybe in the function header?)
> to explain why this boolean is modifying the return string.
>
> Or maybe it is better to have another stringify_objtype_for_grant that
> just wraps this?

... I don't remember writing this code, but it's probably my fault (was
it 7 years ago now?). Maybe we can find a different approach that
doesn't need yet another list of object types? (If I did write it,) we
have a lot more infrastructure now that we had it back then, I think.
In any case it doesn't seem like a function called "stringify_objtype"
with this signature makes sense as an exported function, much less in
utility.c.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"But static content is just dynamic content that isn't moving!"
http://smylers.hates-software.com/2007/08/15/fe244d0c.html

From:	Ajin Cherian <itsajin(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-02-03 13:27:02
Message-ID:	CAFPTHDbXuO5_hDk_rDYCnXFs0WcYB4-mnP6Wr7QS9ekB+MHhTw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

There's a bug in this patch, where if there are multiple publications
that publish ddls,
then multiple event triggers will write multiple WAL records.
This will result in duplicate ddls statements being published.
This will also create multiple WAL records.
To avoid this,
I've added a fix in event_trigger invoking logic
(EventTriggerCommonSetup()) patch-6,
to check for duplicate publication ddl deparsing event trigger functions
and if there are duplicates, only the first event trigger function is
invoked and the rest ignored.
This way, even if there are multiple publications that publish ddls,
then only one set
event trigger functions are invoked.

regards,
Ajin Cherian
Fujitsu Australia

Attachment	Content-Type	Size
v64-0001-Infrastructure-to-support-DDL-deparsing.patch	application/octet-stream	37.3 KB
v64-0004-Introduce-the-test_ddl_deparse_regress-test-modu.patch	application/octet-stream	47.4 KB
v64-0005-DDL-messaging-infrastructure-for-DDL-replication.patch	application/octet-stream	41.2 KB
v64-0002-Functions-to-deparse-Table-DDL-commands.patch	application/octet-stream	130.9 KB
v64-0003-Support-DDL-deparse-of-the-rest-commands.patch	application/octet-stream	198.6 KB
v64-0006-Support-DDL-replication.patch	application/octet-stream	212.0 KB
v64-0007-Document-DDL-replication-and-DDL-deparser.patch	application/octet-stream	40.6 KB

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-02-05 20:49:32
Message-ID:	CAHut+PsK-P4=3fqJSeVtmNovNRxdm1997LOak+AVBfuSLGjrJA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, Feb 3, 2023 at 9:21 PM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>
> On 2023-Feb-03, Peter Smith wrote:
>
...
> > 3. ExecuteGrantStmt
> >
> > + /* Copy the grantor id needed for DDL deparsing of Grant */
> > + istmt.grantor_uid = grantor;
> > +
> >
> > SUGGESTION (comment)
> > Copy the grantor id to the parsetree, needed for DDL deparsing of Grant
>
> Is istmt really "the parse tree" actually? As I recall, it's a derived
> struct that's created during execution of the grant/revoke command, so
> modifying the comment like this would be a mistake.
>

I thought this comment was analogous to another one from this same
patch 0001 (see seclabel.c), so the suggested change above was simply
to make the wording consistent.

@@ -134,6 +134,9 @@ ExecSecLabelStmt(SecLabelStmt *stmt)
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("must specify provider when multiple security label providers
have been loaded")));
provider = (LabelProvider *) linitial(label_provider_list);
+
+ /* Copy the provider name to the parsetree, needed for DDL deparsing
of SecLabelStmt */
+ stmt->provider = pstrdup(provider->provider_name);

So if the suggestion for the ExecuteGrantStmt comment was a mistake
then perhaps the ExecSecLabelStmt comment is wrong also?

------
Kind Regards,
Peter Smith.
Fujitsu Australia

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-02-06 01:16:58
Message-ID:	CAHut+Psc=ntPznJNuAngt40SqEEpnH6=OZdrQnNOJBFL77yjFw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Here are some comments for patch v63-0002.

This is a WIP because I have not yet looked at the large file - ddl_deparse.c.

======
Commit Message

1.
This patch provides JSON blobs representing DDL commands, which can
later be re-processed into plain strings by well-defined sprintf-like
expansion. These JSON objects are intended to allow for machine-editing of
the commands, by replacing certain nodes within the objects.

"This patch provides JSON blobs" --> "This patch constructs JSON blobs"

======
src/backend/commands/ddl_json.

2. Copyright

"2022" --> "2023"

~~~

3.
+/*
+ * Conversion specifier which determines how we expand the JSON element into
+ * string.
+ */
+typedef enum
+{
+ SpecTypeName,
+ SpecOperatorName,
+ SpecDottedName,
+ SpecString,
+ SpecNumber,
+ SpecStringLiteral,
+ SpecIdentifier,
+ SpecRole
+} convSpecifier;

3a.
SUGGESTION (comment)
Conversion specifier which determines how to expand the JSON element
into a string.

3b.
Are these enums in this strange order deliberately? If not, then maybe
alphabetical is better.

~~~

4. Forward declaration

+char *deparse_ddl_json_to_string(char *jsonb);

Why is this forward declared here? Isn't this already declared extern
in ddl_deparse.h?

~~~

5. expand_fmt_recursive

+/*
+ * Recursive helper for deparse_ddl_json_to_string.
+ *
+ * Find the "fmt" element in the given container, and expand it into the
+ * provided StringInfo.
+ */
+static void
+expand_fmt_recursive(JsonbContainer *container, StringInfo buf)

I noticed all the other expand_XXXX functions are passing the
StringInfo buf as the first parameter. For consistency, shouldn’t this
be the same?

6.
+ if (*cp != '%')
+ {
+ appendStringInfoCharMacro(buf, *cp);
+ continue;
+ }
+
+
+ ADVANCE_PARSE_POINTER(cp, end_ptr);
+
+ /* Easy case: %% outputs a single % */
+ if (*cp == '%')
+ {
+ appendStringInfoCharMacro(buf, *cp);
+ continue;
+ }

Double blank lines?

7.
+ ADVANCE_PARSE_POINTER(cp, end_ptr);
+ for (; cp < end_ptr;)
+ {

Maybe a while loop is more appropriate?

8.
+ value = findJsonbValueFromContainer(container, JB_FOBJECT, &key);

Should the code be checking or asserting value is not NULL?

(IIRC I asked this a long time ago - sorry if it was already answered)

~~~

9. expand_jsonval_dottedname

It might be simpler code to use a variable like:
JsonbContainer *data = jsonval->val.binary.data;

Instead of repeating jsonval->val.binary.data many times.

~~~

10. expand_jsonval_typename

It might be simpler code to use a variable like:
JsonbContainer *data = jsonval->val.binary.data;

Instead of repeating jsonval->val.binary.data many times.

~~~

11.
+/*
+ * Expand a JSON value as an operator name.
+ */
+static void
+expand_jsonval_operator(StringInfo buf, JsonbValue *jsonval)

Should this function comment be more like the comment for
expand_jsonval_dottedname by saying there can be an optional
"schemaname"?

~~~

12.
+static bool
+expand_jsonval_string(StringInfo buf, JsonbValue *jsonval)
+{
+ if (jsonval->type == jbvString)
+ {
+ appendBinaryStringInfo(buf, jsonval->val.string.val,
+ jsonval->val.string.len);
+ }
+ else if (jsonval->type == jbvBinary)
+ {
+ json_trivalue present;
+
+ present = find_bool_in_jsonbcontainer(jsonval->val.binary.data,
+ "present");
+
+ /*
+ * If "present" is set to false, this element expands to empty;
+ * otherwise (either true or absent), fall through to expand "fmt".
+ */
+ if (present == tv_false)
+ return false;
+
+ expand_fmt_recursive(jsonval->val.binary.data, buf);
+ }
+ else
+ return false;
+
+ return true;
+}

I felt this could be simpler if there is a new 'expanded' variable
because then you can have just a single return point instead of 3
returns;

If you choose to do this there is a minor tweak to the "fall through" comment.

SUGGESTION
expand_jsonval_string(StringInfo buf, JsonbValue *jsonval)
{
bool expanded = true;

if (jsonval->type == jbvString)
{
appendBinaryStringInfo(buf, jsonval->val.string.val,
jsonval->val.string.len);
}
else if (jsonval->type == jbvBinary)
{
json_trivalue present;

present = find_bool_in_jsonbcontainer(jsonval->val.binary.data,
"present");

/*
* If "present" is set to false, this element expands to empty;
* otherwise (either true or absent), expand "fmt".
*/
if (present == tv_false)
expanded = false;
else
expand_fmt_recursive(jsonval->val.binary.data, buf);
}

return expanded;
}

~~~

13.
+/*
+ * Expand a JSON value as an integer quantity.
+ */
+static void
+expand_jsonval_number(StringInfo buf, JsonbValue *jsonval)
+{

Should this also be checking/asserting that the type is jbvNumeric?

~~~

14.
+/*
+ * Expand a JSON value as a role name. If the is_public element is set to
+ * true, PUBLIC is expanded (no quotes); otherwise, expand the given role name,
+ * quoting as an identifier.
+ */
+static void
+expand_jsonval_role(StringInfo buf, JsonbValue *jsonval)

Maybe more readable to quote that param?

BEFORE
If the is_public element is set...

AFTER
If the 'is_public' element is set...

~~~

15.
+ *
+ * Returns false if no actual expansion was made (due to the "present" flag
+ * being set to "false" in formatted string expansion).
+ */
+static bool
+expand_one_jsonb_element(StringInfo buf, char *param, JsonbValue *jsonval,
+ convSpecifier specifier, const char *fmt)
+{
+ bool result = true;
+ ErrorContextCallback sqlerrcontext;

15a.
Looking at the implementation, maybe that comment can be made more
clear. Something like below:

SUGGESTION
Returns true, except for the formatted string case if no actual
expansion was made (due to the "present" flag being set to "false").

15b.
Maybe use a better variable name.

"result" --> "string_expanded"

======
src/include/catalog/pg_proc.dat

16.
@@ -11891,4 +11891,10 @@
prorettype => 'bytea', proargtypes => 'pg_brin_minmax_multi_summary',
prosrc => 'brin_minmax_multi_summary_send' },

+{ oid => '4642', descr => 'deparse the DDL command into JSON format string',
+ proname => 'ddl_deparse_to_json', prorettype => 'text',
+ proargtypes => 'pg_ddl_command', prosrc => 'ddl_deparse_to_json' },
+{ oid => '4643', descr => 'expand JSON format DDL to a plain DDL command',
+ proname => 'ddl_deparse_expand_command', prorettype => 'text',
+ pr

16a.
"deparse the DDL command into JSON format string" ==> "deparse the DDL
command into a JSON format string"

16b.
"expand JSON format DDL to a plain DDL command" --> "expand JSON
format DDL to a plain text DDL command"

======
src/include/tcop/ddl_deparse.h

17.
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group

"2022" --> "2023"

~~~

+extern char *deparse_utility_command(CollectedCommand *cmd, bool verbose_mode);
+extern char *deparse_ddl_json_to_string(char *jsonb);
+extern char *deparse_drop_command(const char *objidentity, const char
*objecttype,
+ DropBehavior behavior);
+
+
+#endif /* DDL_DEPARSE_H */

Double blank lines.

======
src/include/tcop/deparse_utility.h

18.
@@ -100,6 +103,12 @@ typedef struct CollectedCommand
{
ObjectType objtype;
} defprivs;
+
+ struct
+ {
+ ObjectAddress address;
+ Node *real_create;
+ } ctas;
} d;

All the other sub-structures have comments. IMO this one should have a
comment too.

------
Kind Regards,
Peter Smith.
Fujitsu Australia.

From:	Ajin Cherian <itsajin(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-02-06 07:07:40
Message-ID:	CAFPTHDZpRUnRktDC3FrMcD_PBBmD8-Fd+0og6NP2mabqXqKF1A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Sat, Feb 4, 2023 at 12:27 AM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
> There's a bug in this patch, where if there are multiple publications
> that publish ddls,
> then multiple event triggers will write multiple WAL records.
> This will result in duplicate ddls statements being published.
> This will also create multiple WAL records.
> To avoid this,
> I've added a fix in event_trigger invoking logic
> (EventTriggerCommonSetup()) patch-6,
> to check for duplicate publication ddl deparsing event trigger functions
> and if there are duplicates, only the first event trigger function is
> invoked and the rest ignored.
> This way, even if there are multiple publications that publish ddls,
> then only one set
> event trigger functions are invoked.
>

This patch caused a test case failure. Fixing this.

regards,
Ajin Cherian
Fujitsu Australia

Attachment	Content-Type	Size
v65-0001-Infrastructure-to-support-DDL-deparsing.patch	application/octet-stream	37.3 KB
v65-0002-Functions-to-deparse-Table-DDL-commands.patch	application/octet-stream	130.9 KB
v65-0005-DDL-messaging-infrastructure-for-DDL-replication.patch	application/octet-stream	41.2 KB
v65-0004-Introduce-the-test_ddl_deparse_regress-test-modu.patch	application/octet-stream	47.4 KB
v65-0003-Support-DDL-deparse-of-the-rest-commands.patch	application/octet-stream	198.6 KB
v65-0006-Support-DDL-replication.patch	application/octet-stream	212.0 KB
v65-0007-Document-DDL-replication-and-DDL-deparser.patch	application/octet-stream	40.6 KB

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-02-06 11:32:31
Message-ID:	CALDaNm1G6Oj6n5CvqDLrNcus3=YJ-RzjMQO1TcSreHdHmwcR_A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Mon, 6 Feb 2023 at 06:47, Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Here are some comments for patch v63-0002.
>
> This is a WIP because I have not yet looked at the large file - ddl_deparse.c.
>
> ======
> Commit Message
>
> 1.
> This patch provides JSON blobs representing DDL commands, which can
> later be re-processed into plain strings by well-defined sprintf-like
> expansion. These JSON objects are intended to allow for machine-editing of
> the commands, by replacing certain nodes within the objects.
>
> ~
>
> "This patch provides JSON blobs" --> "This patch constructs JSON blobs"
>
> ======
> src/backend/commands/ddl_json.

Modified

> 2. Copyright
>
> + * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
>
> "2022" --> "2023"
>
Modified

>
> 3.
> +/*
> + * Conversion specifier which determines how we expand the JSON element into
> + * string.
> + */
> +typedef enum
> +{
> + SpecTypeName,
> + SpecOperatorName,
> + SpecDottedName,
> + SpecString,
> + SpecNumber,
> + SpecStringLiteral,
> + SpecIdentifier,
> + SpecRole
> +} convSpecifier;
>
> ~
>
> 3a.
> SUGGESTION (comment)
> Conversion specifier which determines how to expand the JSON element
> into a string.
>
Modified

>
> 3b.
> Are these enums in this strange order deliberately? If not, then maybe
> alphabetical is better.
>
Modified

>
> 4. Forward declaration
>
> +char *deparse_ddl_json_to_string(char *jsonb);
>
> Why is this forward declared here? Isn't this already declared extern
> in ddl_deparse.h?
>
Modified

>
> 5. expand_fmt_recursive
>
> +/*
> + * Recursive helper for deparse_ddl_json_to_string.
> + *
> + * Find the "fmt" element in the given container, and expand it into the
> + * provided StringInfo.
> + */
> +static void
> +expand_fmt_recursive(JsonbContainer *container, StringInfo buf)
>
> I noticed all the other expand_XXXX functions are passing the
> StringInfo buf as the first parameter. For consistency, shouldn’t this
> be the same?
>
Modified

>
> 6.
> + if (*cp != '%')
> + {
> + appendStringInfoCharMacro(buf, *cp);
> + continue;
> + }
> +
> +
> + ADVANCE_PARSE_POINTER(cp, end_ptr);
> +
> + /* Easy case: %% outputs a single % */
> + if (*cp == '%')
> + {
> + appendStringInfoCharMacro(buf, *cp);
> + continue;
> + }
>
> Double blank lines?
>
Modified

>
> 7.
> + ADVANCE_PARSE_POINTER(cp, end_ptr);
> + for (; cp < end_ptr;)
> + {
>
>
> Maybe a while loop is more appropriate?
>
Modified

>
> 8.
> + value = findJsonbValueFromContainer(container, JB_FOBJECT, &key);
>
> Should the code be checking or asserting value is not NULL?
>
> (IIRC I asked this a long time ago - sorry if it was already answered)
>

Yes, this was already answered by Zheng, quoting as "The null checking
for value is done in the upcoming call of expand_one_jsonb_element()."
in [1]

>
> 9. expand_jsonval_dottedname
>
> It might be simpler code to use a variable like:
> JsonbContainer *data = jsonval->val.binary.data;
>
> Instead of repeating jsonval->val.binary.data many times.
>
Modified

>
> 10. expand_jsonval_typename
>
> It might be simpler code to use a variable like:
> JsonbContainer *data = jsonval->val.binary.data;
>
> Instead of repeating jsonval->val.binary.data many times.
>
Modified

>
> 11.
> +/*
> + * Expand a JSON value as an operator name.
> + */
> +static void
> +expand_jsonval_operator(StringInfo buf, JsonbValue *jsonval)
>
> Should this function comment be more like the comment for
> expand_jsonval_dottedname by saying there can be an optional
> "schemaname"?

Modified

> ~~~
>
> 12.
> +static bool
> +expand_jsonval_string(StringInfo buf, JsonbValue *jsonval)
> +{
> + if (jsonval->type == jbvString)
> + {
> + appendBinaryStringInfo(buf, jsonval->val.string.val,
> + jsonval->val.string.len);
> + }
> + else if (jsonval->type == jbvBinary)
> + {
> + json_trivalue present;
> +
> + present = find_bool_in_jsonbcontainer(jsonval->val.binary.data,
> + "present");
> +
> + /*
> + * If "present" is set to false, this element expands to empty;
> + * otherwise (either true or absent), fall through to expand "fmt".
> + */
> + if (present == tv_false)
> + return false;
> +
> + expand_fmt_recursive(jsonval->val.binary.data, buf);
> + }
> + else
> + return false;
> +
> + return true;
> +}
>
> I felt this could be simpler if there is a new 'expanded' variable
> because then you can have just a single return point instead of 3
> returns;
>
> If you choose to do this there is a minor tweak to the "fall through" comment.
>
> SUGGESTION
> expand_jsonval_string(StringInfo buf, JsonbValue *jsonval)
> {
> bool expanded = true;
>
> if (jsonval->type == jbvString)
> {
> appendBinaryStringInfo(buf, jsonval->val.string.val,
> jsonval->val.string.len);
> }
> else if (jsonval->type == jbvBinary)
> {
> json_trivalue present;
>
> present = find_bool_in_jsonbcontainer(jsonval->val.binary.data,
> "present");
>
> /*
> * If "present" is set to false, this element expands to empty;
> * otherwise (either true or absent), expand "fmt".
> */
> if (present == tv_false)
> expanded = false;
> else
> expand_fmt_recursive(jsonval->val.binary.data, buf);
> }
>
> return expanded;
> }

I'm not sure if this change is required as this will introduce a new
variable and require it to be set, this variable should be set to
"expand = false" in else after else if also, instead I preferred the
existing code. I did not make any change for this unless you are
seeing some bigger optimization.

> 13.
> +/*
> + * Expand a JSON value as an integer quantity.
> + */
> +static void
> +expand_jsonval_number(StringInfo buf, JsonbValue *jsonval)
> +{
>
> Should this also be checking/asserting that the type is jbvNumeric?

Modified

>
> 14.
> +/*
> + * Expand a JSON value as a role name. If the is_public element is set to
> + * true, PUBLIC is expanded (no quotes); otherwise, expand the given role name,
> + * quoting as an identifier.
> + */
> +static void
> +expand_jsonval_role(StringInfo buf, JsonbValue *jsonval)
>
> Maybe more readable to quote that param?
>
> BEFORE
> If the is_public element is set...
>
> AFTER
> If the 'is_public' element is set...
>
> ~~~
>
> 15.
> + *
> + * Returns false if no actual expansion was made (due to the "present" flag
> + * being set to "false" in formatted string expansion).
> + */
> +static bool
> +expand_one_jsonb_element(StringInfo buf, char *param, JsonbValue *jsonval,
> + convSpecifier specifier, const char *fmt)
> +{
> + bool result = true;
> + ErrorContextCallback sqlerrcontext;

Modified

>
> 15a.
> Looking at the implementation, maybe that comment can be made more
> clear. Something like below:
>
> SUGGESTION
> Returns true, except for the formatted string case if no actual
> expansion was made (due to the "present" flag being set to "false").

Modified

> 15b.
> Maybe use a better variable name.
>
> "result" --> "string_expanded"

Modified

> ======
> src/include/catalog/pg_proc.dat
>
> 16.
> @@ -11891,4 +11891,10 @@
> prorettype => 'bytea', proargtypes => 'pg_brin_minmax_multi_summary',
> prosrc => 'brin_minmax_multi_summary_send' },
>
> +{ oid => '4642', descr => 'deparse the DDL command into JSON format string',
> + proname => 'ddl_deparse_to_json', prorettype => 'text',
> + proargtypes => 'pg_ddl_command', prosrc => 'ddl_deparse_to_json' },
> +{ oid => '4643', descr => 'expand JSON format DDL to a plain DDL command',
> + proname => 'ddl_deparse_expand_command', prorettype => 'text',
> + pr
>
> 16a.
> "deparse the DDL command into JSON format string" ==> "deparse the DDL
> command into a JSON format string"

Modified

> 16b.
> "expand JSON format DDL to a plain DDL command" --> "expand JSON
> format DDL to a plain text DDL command"

Modified

> src/include/tcop/ddl_deparse.h
>
> 17.
> + * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
>
> "2022" --> "2023"

Modified

> +extern char *deparse_utility_command(CollectedCommand *cmd, bool verbose_mode);
> +extern char *deparse_ddl_json_to_string(char *jsonb);
> +extern char *deparse_drop_command(const char *objidentity, const char
> *objecttype,
> + DropBehavior behavior);
> +
> +
> +#endif /* DDL_DEPARSE_H */
>
> Double blank lines.

Modified

> ======
> src/include/tcop/deparse_utility.h
>
> 18.
> @@ -100,6 +103,12 @@ typedef struct CollectedCommand
> {
> ObjectType objtype;
> } defprivs;
> +
> + struct
> + {
> + ObjectAddress address;
> + Node *real_create;
> + } ctas;
> } d;
>
> All the other sub-structures have comments. IMO this one should have a
> comment too.

Modified

[1] - https://www.postgresql.org/message-id/CAAD30UJ2MmA7vM1H2b20L_SMHS0-76raROqZELs-GDGk3Pet5A%40mail.gmail.com

Regards,
Vignesh

Attachment	Content-Type	Size
v66-0001-Infrastructure-to-support-DDL-deparsing.patch	text/x-patch	37.5 KB
v66-0004-Introduce-the-test_ddl_deparse_regress-test-modu.patch	text/x-patch	47.4 KB
v66-0005-DDL-messaging-infrastructure-for-DDL-replication.patch	text/x-patch	41.5 KB
v66-0002-Functions-to-deparse-Table-DDL-commands.patch	text/x-patch	131.0 KB
v66-0003-Support-DDL-deparse-of-the-rest-commands.patch	text/x-patch	198.7 KB
v66-0006-Support-DDL-replication.patch	text/x-patch	212.0 KB
v66-0007-Document-DDL-replication-and-DDL-deparser.patch	text/x-patch	40.6 KB

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-02-07 12:05:10
Message-ID:	CALDaNm1E-5C-jj-FK6EuNn8w=75aFWeWrNF4U+uQJ0LbLM9fLg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Mon, 6 Feb 2023 at 17:02, vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Mon, 6 Feb 2023 at 06:47, Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
> > Here are some comments for patch v63-0002.
> >
> > This is a WIP because I have not yet looked at the large file - ddl_deparse.c.
> >
> > ======
> > Commit Message
> >
> > 1.
> > This patch provides JSON blobs representing DDL commands, which can
> > later be re-processed into plain strings by well-defined sprintf-like
> > expansion. These JSON objects are intended to allow for machine-editing of
> > the commands, by replacing certain nodes within the objects.
> >
> > ~
> >
> > "This patch provides JSON blobs" --> "This patch constructs JSON blobs"
> >
> > ======
> > src/backend/commands/ddl_json.
>
> Modified
>

I found few issues while testing:
Issue 1: core dump
Steps to reproduce:
CREATE TABLE lock_tbl1 (a BIGINT);
CREATE TABLE lock_tbl1a (a BIGINT);
CREATE VIEW lock_view1 AS SELECT * FROM lock_tbl1;
CREATE VIEW lock_view2(a,b) AS SELECT * FROM lock_tbl1, lock_tbl1a;
CREATE VIEW lock_view3 AS SELECT * from lock_view2;
CREATE OR REPLACE VIEW lock_view2 AS SELECT * from lock_view3;

Stack trace for the same:
#5 0x00005573000128ac in ExceptionalCondition
(conditionName=0x5573001830a3 "IsA(stmt, AlterTableStmt)",
fileName=0x5573001821de "ddl_deparse.c", lineNumber=2840) at
assert.c:66
#6 0x00005572ffa8ddef in deparse_AlterRelation (cmd=0x557301038ec8)
at ddl_deparse.c:2840
#7 0x00005572ffaa1895 in deparse_utility_command (cmd=0x557301038ec8,
verbose_mode=false) at ddl_deparse.c:9820
#8 0x00005572ffd6daeb in publication_deparse_ddl_command_end
(fcinfo=0x7fff3eba50b0) at ddltrigger.c:203
#9 0x00005572ffaa7f87 in EventTriggerInvoke
(fn_oid_list=0x557301033d80, trigdata=0x7fff3eba5110) at
event_trigger.c:1047
#10 0x00005572ffaa7769 in EventTriggerDDLCommandEnd
(parsetree=0x557300f5b548) at event_trigger.c:719
#11 0x00005572ffe33a22 in ProcessUtilitySlow (pstate=0x5573010458b8,
pstmt=0x557300f5b618, queryString=0x557300f5a7c8 "CREATE OR REPLACE
VIEW lock_view2 AS SELECT * from lock_view3;",
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
dest=0x557300f5b8d8, qc=0x7fff3eba5910) at utility.c:1933

Issue 2: unsupported object type error
Steps to reproduce:
create table t1(c1 int);
ALTER TABLE t1 ADD CONSTRAINT onek_check_constraint CHECK (c1 >= 0);
ALTER TABLE t1 RENAME CONSTRAINT onek_check_constraint TO
onek_check_constraint_foo;

Issue 3: object name not found error
Steps to reproduce:
create type xfloat4;
create function xfloat4in(cstring) returns xfloat4 immutable strict
language internal as 'int4in';
create function xfloat4out(xfloat4) returns cstring immutable strict
language internal as 'int4out';
CREATE TYPE xfloat4 (
internallength = 16,
input = xfloat4in,
output = xfloat4out,
element = int4,
category = 'x', -- just to verify the system will take it
preferred = true -- ditto
);

Issue 4: unsupported alter table subtype 18
Steps to reproduce:
create type comptype as (r float8, i float8);
create domain dcomptype as comptype;
alter domain dcomptype add constraint c1 check ((value).r > 0);
alter type comptype alter attribute r type bigint;

Issue 5: unsupported object type 13
Steps to reproduce:
create domain testdomain1 as int constraint unsigned check (value > 0);
alter domain testdomain1 rename constraint unsigned to unsigned_foo;

Issue 6: invalid ObjTree element type
Steps to reproduce:
create extension file_fdw;
CREATE FOREIGN DATA WRAPPER foo;
alter foreign data wrapper foo HANDLER file_fdw_handler;
WARNING: changing the foreign-data wrapper handler can change
behavior of existing foreign tables
ERROR: invalid ObjTree element type 1693984336

Issue 7: no owned sequence found
Steps to reproduce:
CREATE TABLE itest13 (a int);
ALTER TABLE itest13 ADD COLUMN b int GENERATED BY DEFAULT AS IDENTITY;

Issue 8: could not find tuple for constraint 0
Steps to reproduce:
create table p1(f1 int);
create table p1_c1() inherits(p1);
alter table p1 add constraint inh_check_constraint1 check (f1 > 0);
alter table p1_c1 add constraint inh_check_constraint1 check (f1 > 0);

Issue 9: unsupported object type 38
Steps to reproduce:
CREATE SUBSCRIPTION regress_testsub CONNECTION
'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect =
false);
COMMENT ON SUBSCRIPTION regress_testsub IS 'test subscription';

Regards,
Vignesh

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-02-08 22:16:49
Message-ID:	CAHut+PvWxg1sMzPy+adTJ3PT4hDeGV3pMBWPdGE=desbU7QDbg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Hi Vignesh, thanks for addressing my v63-0002 review comments.

I confirmed most of the changes. Below is a quick follow-up for the
remaining ones.

On Mon, Feb 6, 2023 at 10:32 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Mon, 6 Feb 2023 at 06:47, Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
...
> >
> > 8.
> > + value = findJsonbValueFromContainer(container, JB_FOBJECT, &key);
> >
> > Should the code be checking or asserting value is not NULL?
> >
> > (IIRC I asked this a long time ago - sorry if it was already answered)
> >
>
> Yes, this was already answered by Zheng, quoting as "The null checking
> for value is done in the upcoming call of expand_one_jsonb_element()."
> in [1]

Thanks for the info. I saw that Zheng-san only wrote it is handled in
the “upcoming call of expand_one_jsonb_element”, but I don’t know if
that is sufficient. For example, if the execution heads down the other
path (expand_jsonb_array) with a NULL jsonarr then it going to crash,
isn't it? So I still think some change may be needed here.

> > 11.
> > +/*
> > + * Expand a JSON value as an operator name.
> > + */
> > +static void
> > +expand_jsonval_operator(StringInfo buf, JsonbValue *jsonval)
> >
> > Should this function comment be more like the comment for
> > expand_jsonval_dottedname by saying there can be an optional
> > "schemaname"?
>
> Modified

Is it really OK for the “objname" to be optional here (Yes, I know the
code is currently implemented like it is OK, but I am doubtful)

That would everything can be optional and the buf result might be
nothing. It could also mean if the "schemaname" is provided but the
"objname" is not, then the buf will have a trailing ".".

It doesn't sound quite right to me.

>
> > ~~~
> >
> > 12.
> > +static bool
> > +expand_jsonval_string(StringInfo buf, JsonbValue *jsonval)
> > +{
> > + if (jsonval->type == jbvString)
> > + {
> > + appendBinaryStringInfo(buf, jsonval->val.string.val,
> > + jsonval->val.string.len);
> > + }
> > + else if (jsonval->type == jbvBinary)
> > + {
> > + json_trivalue present;
> > +
> > + present = find_bool_in_jsonbcontainer(jsonval->val.binary.data,
> > + "present");
> > +
> > + /*
> > + * If "present" is set to false, this element expands to empty;
> > + * otherwise (either true or absent), fall through to expand "fmt".
> > + */
> > + if (present == tv_false)
> > + return false;
> > +
> > + expand_fmt_recursive(jsonval->val.binary.data, buf);
> > + }
> > + else
> > + return false;
> > +
> > + return true;
> > +}
> >
> > I felt this could be simpler if there is a new 'expanded' variable
> > because then you can have just a single return point instead of 3
> > returns;
> >
> > If you choose to do this there is a minor tweak to the "fall through" comment.
> >
> > SUGGESTION
> > expand_jsonval_string(StringInfo buf, JsonbValue *jsonval)
> > {
> > bool expanded = true;
> >
> > if (jsonval->type == jbvString)
> > {
> > appendBinaryStringInfo(buf, jsonval->val.string.val,
> > jsonval->val.string.len);
> > }
> > else if (jsonval->type == jbvBinary)
> > {
> > json_trivalue present;
> >
> > present = find_bool_in_jsonbcontainer(jsonval->val.binary.data,
> > "present");
> >
> > /*
> > * If "present" is set to false, this element expands to empty;
> > * otherwise (either true or absent), expand "fmt".
> > */
> > if (present == tv_false)
> > expanded = false;
> > else
> > expand_fmt_recursive(jsonval->val.binary.data, buf);
> > }
> >
> > return expanded;
> > }
>
> I'm not sure if this change is required as this will introduce a new
> variable and require it to be set, this variable should be set to
> "expand = false" in else after else if also, instead I preferred the
> existing code. I did not make any change for this unless you are
> seeing some bigger optimization.
>

Sorry, I messed up the previous code suggestion. It should have said:

SUGGESTION
expand_jsonval_string(StringInfo buf, JsonbValue *jsonval)
{
bool expanded = false;

if (jsonval->type == jbvString)
{
appendBinaryStringInfo(buf, jsonval->val.string.val,
jsonval->val.string.len);
expanded = true;
}
else if (jsonval->type == jbvBinary)
{
json_trivalue present;
present =
find_bool_in_jsonbcontainer(jsonval->val.binary.data, "present");

/*
* If "present" is set to false, this element expands to empty;
* otherwise (either true or absent), expand "fmt".
*/
if (present != tv_false)
{
expand_fmt_recursive(jsonval->val.binary.data, buf);
expanded = true;
}
}
return expanded;
}

But I have no special "optimization" in mind. Only, IMO the code is
easier to understand, because:
- 1 return is simpler than 3 returns
- 1 else is simpler than 2 else's

YMMV.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

From:	Ajin Cherian <itsajin(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-02-09 09:55:17
Message-ID:	CAFPTHDbZzCeYMPJn0iFuD_ggpY-0ZHfVBHgQ9VJ6v4dF59xang@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, Feb 3, 2023 at 11:41 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Here are some review comments for patch v63-0001.
>
> ======
> General
>
> 1.
> (This is not really a review comment - more just an observation...)
>
> This patch seemed mostly like an assortment of random changes that
> don't seem to have anything in common except that some *later* patches
> of this set are apparently going to want them.
>
> Now maybe doing it this way was the best and neatest thing to do --
> I'm not sure. But my first impression was I felt this has gone too far
> in some places -- e.g. perhaps some of these changes would have been
> better deferred until they are *really* needed instead of just
> plonking a whole lot of un-called (i.e. untested) code into patch
> 0001.
>
>

Alvaro has replied to this.

> ======
> Commit message
>
> 2.
> 2) Some of the prototype and structures were moved from pg_publication.h
> to publicationcmds.h as one of the later patch requires inclusion of
> pg_publication.h and these prototype had references to server header
> files.
>
> SUGGESTION (?)
> 2) Some prototypes and structures were moved from pg_publication.h to
> publicationcmds.h. This was because one of the later patches required
> the inclusion of pg_publication.h and these prototypes had references
> to server header files.
>

Changed.

>
> ======
> src/backend/catalog/aclchk.c
>
> 3. ExecuteGrantStmt
>
> + /* Copy the grantor id needed for DDL deparsing of Grant */
> + istmt.grantor_uid = grantor;
> +
>
> SUGGESTION (comment)
> Copy the grantor id to the parsetree, needed for DDL deparsing of Grant
>

didn't change this, as Alvaro said this was not a parsetree.

> ======
> src/backend/catalog/objectaddress.c
>
> 4. getObjectIdentityParts
>
> @@ -5922,7 +5922,7 @@ getObjectIdentityParts(const ObjectAddress *object,
> transformType = format_type_be_qualified(transform->trftype);
> transformLang = get_language_name(transform->trflang, false);
>
> - appendStringInfo(&buffer, "for %s on language %s",
> + appendStringInfo(&buffer, "for %s language %s",
> transformType,
> transformLang);
>
> There is no clue anywhere what this change was for.
>
> Perhaps this ought to be mentioned in the Commit Message.
>

added this in the commit message.

>
> ======
> src/backend/commands/collationcmds.c
>
> 5.
> + /*
> + * Make from existing collationid available to callers for statement such as
> + * CREATE COLLATION any_name FROM any_name
> + */
> + if (from_existing_collid && OidIsValid(collid))
> + ObjectAddressSet(*from_existing_collid, CollationRelationId, collid);
>
> "for statement such as" --> "for statements such as"
>

changed.

> ======
> src/backend/commands/event_trigger.c
>
> 6.
> +EventTriggerQueryState *currentEventTriggerState = NULL;
>
> It seems overkill to make this non-static here. I didn't find anybody
> using this variable from outside this source, so unless this was a
> mistake I guess it's preparing the ground for some future patch.
> Either way, it didn't seem like this belonged in patch 0001.
>

The idea is to use this as a preparatory patch.

> ======
> src/backend/commands/sequence.c
>
> 7.
> +Form_pg_sequence_data
> +get_sequence_values(Oid sequenceId)
> +{
> + Buffer buf;
> + SeqTable elm;
> + Relation seqrel;
> + HeapTupleData seqtuple;
> + Form_pg_sequence_data seq;
> + Form_pg_sequence_data retSeq;
> +
> + /* Open and AccessShareLock sequence */
> + init_sequence(sequenceId, &elm, &seqrel);
> +
> + if (pg_class_aclcheck(sequenceId, GetUserId(),
> + ACL_SELECT | ACL_UPDATE | ACL_USAGE) != ACLCHECK_OK)
> + ereport(ERROR,
> + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> + errmsg("permission denied for sequence %s",
> + RelationGetRelationName(seqrel))));
> +
> + seq = read_seq_tuple(seqrel, &buf, &seqtuple);
> + retSeq = palloc(sizeof(FormData_pg_sequence_data));
> +
> + memcpy(retSeq, seq, sizeof(FormData_pg_sequence_data));
> +
> + UnlockReleaseBuffer(buf);
> + relation_close(seqrel, NoLock);
> +
> + return retSeq;
> +}
>
> IMO the palloc might be better done up-front when the retSeq was declared.
>

changed.

> ======
> src/backend/tcop/utility.c
>
> 8.
> +/*
> + * Return the given object type as a string.
> + */
> +const char *
> +stringify_objtype(ObjectType objtype, bool isgrant)
> +{
> + switch (objtype)
> + {
> + case OBJECT_AGGREGATE:
> + return "AGGREGATE";
> + case OBJECT_CAST:
> + return "CAST";
> + case OBJECT_COLLATION:
> + return "COLLATION";
> + case OBJECT_COLUMN:
> + return isgrant ? "TABLE" : "COLUMN";
> + case OBJECT_CONVERSION:
> + return "CONVERSION";
> + case OBJECT_DATABASE:
> + return "DATABASE";
> + case OBJECT_DOMAIN:
> + return "DOMAIN";
> + case OBJECT_EVENT_TRIGGER:
> + return "EVENT TRIGGER";
> + case OBJECT_EXTENSION:
> + return "EXTENSION";
> + case OBJECT_FDW:
> + return "FOREIGN DATA WRAPPER";
> + case OBJECT_FOREIGN_SERVER:
> + return isgrant ? "FOREIGN SERVER" : "SERVER";
> + case OBJECT_FOREIGN_TABLE:
> + return "FOREIGN TABLE";
>
> That 'is_grant' param seemed a bit hacky.
>
> At least some comment should be given (maybe in the function header?)
> to explain why this boolean is modifying the return string.
>

added comment in the function header.

> ======
> src/backend/utils/adt/regproc.c
>
> 9.
> +
> +/*
> + * Append the parenthesized arguments of the given pg_proc row into the output
> + * buffer. force_qualify indicates whether to schema-qualify type names
> + * regardless of visibility.
> + */
> +static void
> +format_procedure_args_internal(Form_pg_proc procform, StringInfo buf,
> + bool force_qualify)
> +{
> + int i;
> + char* (*func[2])(Oid) = {format_type_be, format_type_be_qualified};
> +
> + appendStringInfoChar(buf, '(');
> + for (i = 0; i < procform->pronargs; i++)
> + {
> + Oid thisargtype = procform->proargtypes.values[i];
> + char *argtype = NULL;
> +
> + if (i > 0)
> + appendStringInfoChar(buf, ',');
> +
> + argtype = func[force_qualify](thisargtype);
> + appendStringInfoString(buf, argtype);
> + pfree(argtype);
> + }
> + appendStringInfoChar(buf, ')');
> +}
>
> 9a.
> Assign argtype = NULL looks redundant because it will always be
> overwritten anyhow.
>

changed this.

> ~
>
> 9b.
> I understand why this function was put here beside the other static
> functions in "Support Routines" but IMO it really belongs nearby (i.e.
> directly above) the only caller (format_procedure_args). Keeping both
> those functional together will improve the readability of both, and
> will also remove the need to have the static forward declaration.
>
> ======
> src/backend/utils/adt/ruleutils.c
>
> 10.
> +void
> +pg_get_ruledef_detailed(Datum ev_qual, Datum ev_action,
> + char **whereClause, List **actions)
> +{
> + int prettyFlags = 0;
> + char *qualstr = TextDatumGetCString(ev_qual);
> + char *actionstr = TextDatumGetCString(ev_action);
> + List *actionNodeList = (List *) stringToNode(actionstr);
> + StringInfoData buf;
> +
> + *whereClause = NULL;
> + *actions = NIL;
> + initStringInfo(&buf);
> + if (strlen(qualstr) > 0 && strcmp(qualstr, "<>") != 0)
> + {
>
> If you like, that condition could have been written more simply as:
>
> if (*qualstr && strcmp(qualstr, "<>") != 0)
>

fixed.

> ~~~
>
> 11.
> +/*
> + * Parse back the TriggerWhen clause of a trigger given the
> pg_trigger record and
> + * the expression tree (in nodeToString() representation) from
> pg_trigger.tgqual
> + * for the trigger's WHEN condition.
> + */
> +char *
> +pg_get_trigger_whenclause(Form_pg_trigger trigrec, Node *whenClause,
> bool pretty)
> +{
>
> It seemed "Parse back" is a typo.
>
> I assume it was meant to say something like "Passes back", or maybe
> just "Returns" is better.

fixed.

>
> ======
> src/include/replication/logicalrelation.h
>
> 12.
> @@ -14,6 +14,7 @@
>
> #include "access/attmap.h"
> #include "replication/logicalproto.h"
> +#include "storage/lockdefs.h"
>
> What is this needed here for? I tried without this change and
> everything still builds OK.
>

fixed.

regards,
Ajin Cherian
Fujitsu Australia

Attachment	Content-Type	Size
v67-0001-Infrastructure-to-support-DDL-deparsing.patch	application/octet-stream	37.1 KB
v67-0005-DDL-messaging-infrastructure-for-DDL-replication.patch	application/octet-stream	41.6 KB
v67-0004-Introduce-the-test_ddl_deparse_regress-test-modu.patch	application/octet-stream	47.4 KB
v67-0002-Functions-to-deparse-Table-DDL-commands.patch	application/octet-stream	131.0 KB
v67-0003-Support-DDL-deparse-of-the-rest-commands.patch	application/octet-stream	198.6 KB
v67-0007-Document-DDL-replication-and-DDL-deparser.patch	application/octet-stream	40.6 KB
v67-0006-Support-DDL-replication.patch	application/octet-stream	212.0 KB

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Ajin Cherian <itsajin(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-02-09 10:55:22
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

I happened to notice that MINVFUNC in 0003 displays like this
"fmt": "MINVFUNC==%{type}T",
in some cases; this appears in the JSON that's emitted by the regression
tests at some point. How can we detect this kind of thing, so that
these mistakes become self-evident? I thought the intention of the
regress module was to run the deparsed code, so the syntax error should
have become obvious.

...

Oh, I see the problem. There are two 'fmt' lines for that clause (and
many others), one of which is used when the clause is not present. So
there's never a syntax error, because this one never expands other than
to empty.

AFAICS this defeats the purpose of the 'present' field. I mean, if the
clause is never to deparse, then why have it there in the first place?
If we want to have it, then it has to be correct.

I think we should design the code to avoid the repetition, because that
has an inherent risk of typo bugs and such. Maybe we should set forth
policy that each 'fmt' string should appear in the source code only
once. So instead of this

+ /* MINVFUNC */
+ if (OidIsValid(agg->aggminvtransfn))
+ tmp = new_objtree_VA("MINVFUNC=%{type}T", 1,
+ "type", ObjTypeObject,
+ new_objtree_for_qualname_id(ProcedureRelationId,
+ agg->aggminvtransfn));
+ else
+ {
+ tmp = new_objtree("MINVFUNC==%{type}T");
+ append_bool_object(tmp, "present", false);
+ }

we would have something like

tmp = new_objtree("MINVFUNC=%{type}T");
if (OidIsValid(agg->aggminvtransfn))
{
append_bool_object(tmp, "present", true);
append...(tmp, "type", new_objtree_for_qualname_id(ProcedureRelationId, agg->aggminvtransfn));
}
else
{
append_bool_object(tmp, "present", false);
}

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"No necesitamos banderas
No reconocemos fronteras" (Jorge González)

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Ajin Cherian <itsajin(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-02-10 11:06:52
Message-ID:	CAA4eK1JGLCd+do-KurYrjnSxLwFNHf4bA7bL_r5Xc=ubVeVZpw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Thu, Feb 9, 2023 at 3:25 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>

Comments on 0001 and 0002
=======================
1.
* CREATE COLLATION
*/
ObjectAddress
-DefineCollation(ParseState *pstate, List *names, List *parameters,
bool if_not_exists)
+DefineCollation(ParseState *pstate, List *names, List *parameters,
+ bool if_not_exists, ObjectAddress *from_existing_collid)

I think it is better to expand function header comments to explain
return values especially from_existing_collid.

2.
+Form_pg_sequence_data
+get_sequence_values(Oid sequenceId)
+{
+ Buffer buf;
+ SeqTable elm;
+ Relation seqrel;
+ HeapTupleData seqtuple;
+ Form_pg_sequence_data seq;
+ Form_pg_sequence_data retSeq = palloc(sizeof(FormData_pg_sequence_data));
+
+ /* Open and AccessShareLock sequence */
+ init_sequence(sequenceId, &elm, &seqrel);

The comment above init_sequence() seems wrong to me. AFAICS, we
acquire RowExclusiveLock in init_sequence() via
lock_and_open_sequence(). Am, I missing anything?

3.
+get_sequence_values(Oid sequenceId)
...
...
+
+ if (pg_class_aclcheck(sequenceId, GetUserId(),
+ ACL_SELECT | ACL_UPDATE | ACL_USAGE) != ACLCHECK_OK)
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),

Why do we need to check UPDATE privilege just for reading the values?

4. I checked the callers of get_sequence_values and they just need
'last_val' but we still expose all values from Form_pg_sequence_data,
not sure if that is required. In deparse_CreateSeqStmt(), we use it to
append RESTART but the CREATE SEQUENCE syntax in docs doesn't have a
RESTART clause, so I am confused as to why the patch appends the same.
If it is really required then please add the comments for the same.

5. In deparse_CreateSeqStmt() and deparse_ColumnIdentity(), we open
SequenceRelationId, is that really required? Isn't locking the
sequence relation sufficient as is done by get_sequence_values()?
Also, I see that deparse_CreateSeqStmt() opens and locks the sequence
whereas deparse_ColumnIdentity() doesn't do the same. Then, we unlock
the relation in some cases but not in others (like get_sequence_values
uses NoLock whereas others release the lock while closing the rel).

6. In get_sequence_values(), we check the permissions whereas callers
just assume that it is done and don't check it while accessing the
sequence. This makes the code a bit unclear.

7. After seeing the above inconsistencies, I am thinking will it be
better to design get_sequence_values() such that it returns both
sequence parameters and last_val in a structure and the callers use
it. That would bit clean and avoid opening the relation multiple
times.

8.
+/*
+ * Return the given object type as a string.
+ * If isgrant is true, then this function is called
+ * while deparsing GRANT statement and some object
+ * names are replaced.
+ */
+const char *
+stringify_objtype(ObjectType objtype, bool isgrant)

Have an empty line after the Return line. The line length appears too short.

9. Similar to stringify_grant_objtype() and
stringify_adefprivs_objtype(), shall we keep the list of all
unsupported types in stringify_objtype()? That will help us to easily
identify which objects are yet not supported.

10. In pg_get_ruledef_detailed(), the code to form a string for qual
and action is mostly the same as what we have in make_ruledef(). I
think we can extract the common code into a separate function to avoid
missing the code updates at one of the places. I see that
'special_exprkind' is present in one place and not in other, it may be
that over time, we have added new things to deparse_context which
doesn't get updated in the patch. Also, I noticed that for
CMD_NOTHING, the patch just ignores the action whereas the core code
does append the definition. We should check whether such a difference
is required and if so, then add comments for the same.

--
With Regards,
Amit Kapila.

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Ajin Cherian <itsajin(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-02-10 14:52:41
Message-ID:	CAD21AoDf3M09f1cqKbJFQmNwx=QvrodABoBMq-ecFoVXQNUHpQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Hi,

On Thu, Feb 9, 2023 at 6:55 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
(v67)

I have some questions about adding the infrastructure for DDL deparsing.

Apart from the changes made by 0001 patch to add infrastructure for
DDL deparsing, 0002 patch seems to add some variables that are not
used in 0002 patch:

@@ -2055,6 +2055,7 @@ typedef struct AlterTableStmt
List *cmds; /* list of subcommands */
ObjectType objtype; /* type of object */
bool missing_ok; /* skip error if table
missing */
+ bool table_like; /* internally generated for
TableLikeClause */
} AlterTableStmt;

@@ -39,6 +40,7 @@ typedef struct CollectedATSubcmd
{
ObjectAddress address; /* affected column,
constraint, index, ... */
Node *parsetree;
+ char *usingexpr;
} CollectedATSubcmd;

typedef struct CollectedCommand
@@ -62,6 +64,7 @@ typedef struct CollectedCommand
{
Oid objectId;
Oid classId;
+ bool rewrite;
List *subcmds;
} alterTable;

These three variables are used in 0006 patch.

Looking at 0006 patch (Support DDL replication), it seems to me that
it includes not only DDL replication support but also changes for the
event trigger. For instance, the patch adds
EventTriggerAlterTypeStart() and EventTriggerAlterTypeEnd(). If these
changes are required for DDL deparse, should we include them in 0001
patch? Perhaps the same is true for
EventTriggerCollectCreatePublication() and friends. IIUC the DDL
deparse and DDL replication are independent features, so I think 0006
patch should not include any changes for DDL deparse infrastructure.

Also, 0003 and 0006 patches introduce SCT_Create/AlterPublication and
change DDL deparse so that it deparse CREATE/ALTER PUBLICATION in a
different way from other simple commands. Is there any reason for
that? I mean, since EventTriggerCollectCreatePublication() collects
the information from the parse tree, I wonder if we could use
SCT_Simple for them.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-02-10 16:20:55
Message-ID:	CALDaNm2ck3c-UDx5QfzJgMDu9rzVa-tj+UGrMDLWBJ020_5wvg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Thu, 9 Feb 2023 at 03:47, Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Hi Vignesh, thanks for addressing my v63-0002 review comments.
>
> I confirmed most of the changes. Below is a quick follow-up for the
> remaining ones.
>
> On Mon, Feb 6, 2023 at 10:32 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > On Mon, 6 Feb 2023 at 06:47, Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > >
> ...
> > >
> > > 8.
> > > + value = findJsonbValueFromContainer(container, JB_FOBJECT, &key);
> > >
> > > Should the code be checking or asserting value is not NULL?
> > >
> > > (IIRC I asked this a long time ago - sorry if it was already answered)
> > >
> >
> > Yes, this was already answered by Zheng, quoting as "The null checking
> > for value is done in the upcoming call of expand_one_jsonb_element()."
> > in [1]
>
> Thanks for the info. I saw that Zheng-san only wrote it is handled in
> the “upcoming call of expand_one_jsonb_element”, but I don’t know if
> that is sufficient. For example, if the execution heads down the other
> path (expand_jsonb_array) with a NULL jsonarr then it going to crash,
> isn't it? So I still think some change may be needed here.

Added an Assert for this.

> > > 11.
> > > +/*
> > > + * Expand a JSON value as an operator name.
> > > + */
> > > +static void
> > > +expand_jsonval_operator(StringInfo buf, JsonbValue *jsonval)
> > >
> > > Should this function comment be more like the comment for
> > > expand_jsonval_dottedname by saying there can be an optional
> > > "schemaname"?
> >
> > Modified
>
> Is it really OK for the “objname" to be optional here (Yes, I know the
> code is currently implemented like it is OK, but I am doubtful)
>
> That would everything can be optional and the buf result might be
> nothing. It could also mean if the "schemaname" is provided but the
> "objname" is not, then the buf will have a trailing ".".
>
> It doesn't sound quite right to me.

I checked that we have specified operator names everywhere, so added
error handling for this case and modified the function header
accordingly.

> > > ~~~
> > >
> > > 12.
> > > +static bool
> > > +expand_jsonval_string(StringInfo buf, JsonbValue *jsonval)
> > > +{
> > > + if (jsonval->type == jbvString)
> > > + {
> > > + appendBinaryStringInfo(buf, jsonval->val.string.val,
> > > + jsonval->val.string.len);
> > > + }
> > > + else if (jsonval->type == jbvBinary)
> > > + {
> > > + json_trivalue present;
> > > +
> > > + present = find_bool_in_jsonbcontainer(jsonval->val.binary.data,
> > > + "present");
> > > +
> > > + /*
> > > + * If "present" is set to false, this element expands to empty;
> > > + * otherwise (either true or absent), fall through to expand "fmt".
> > > + */
> > > + if (present == tv_false)
> > > + return false;
> > > +
> > > + expand_fmt_recursive(jsonval->val.binary.data, buf);
> > > + }
> > > + else
> > > + return false;
> > > +
> > > + return true;
> > > +}
> > >
> > > I felt this could be simpler if there is a new 'expanded' variable
> > > because then you can have just a single return point instead of 3
> > > returns;
> > >
> > > If you choose to do this there is a minor tweak to the "fall through" comment.
> > >
> > > SUGGESTION
> > > expand_jsonval_string(StringInfo buf, JsonbValue *jsonval)
> > > {
> > > bool expanded = true;
> > >
> > > if (jsonval->type == jbvString)
> > > {
> > > appendBinaryStringInfo(buf, jsonval->val.string.val,
> > > jsonval->val.string.len);
> > > }
> > > else if (jsonval->type == jbvBinary)
> > > {
> > > json_trivalue present;
> > >
> > > present = find_bool_in_jsonbcontainer(jsonval->val.binary.data,
> > > "present");
> > >
> > > /*
> > > * If "present" is set to false, this element expands to empty;
> > > * otherwise (either true or absent), expand "fmt".
> > > */
> > > if (present == tv_false)
> > > expanded = false;
> > > else
> > > expand_fmt_recursive(jsonval->val.binary.data, buf);
> > > }
> > >
> > > return expanded;
> > > }
> >
> > I'm not sure if this change is required as this will introduce a new
> > variable and require it to be set, this variable should be set to
> > "expand = false" in else after else if also, instead I preferred the
> > existing code. I did not make any change for this unless you are
> > seeing some bigger optimization.
> >
>
> Sorry, I messed up the previous code suggestion. It should have said:
>
> SUGGESTION
> expand_jsonval_string(StringInfo buf, JsonbValue *jsonval)
> {
> bool expanded = false;
>
> if (jsonval->type == jbvString)
> {
> appendBinaryStringInfo(buf, jsonval->val.string.val,
> jsonval->val.string.len);
> expanded = true;
> }
> else if (jsonval->type == jbvBinary)
> {
> json_trivalue present;
> present =
> find_bool_in_jsonbcontainer(jsonval->val.binary.data, "present");
>
> /*
> * If "present" is set to false, this element expands to empty;
> * otherwise (either true or absent), expand "fmt".
> */
> if (present != tv_false)
> {
> expand_fmt_recursive(jsonval->val.binary.data, buf);
> expanded = true;
> }
> }
> return expanded;
> }

This looks better, I have included this change.

The attached v68 version patch has the changes for the same.

Regards,
Vignesh

Attachment	Content-Type	Size
v68-0005-DDL-messaging-infrastructure-for-DDL-replication.patch	text/x-patch	41.9 KB
v68-0001-Infrastructure-to-support-DDL-deparsing.patch	text/x-patch	37.3 KB
v68-0002-Functions-to-deparse-Table-DDL-commands.patch	text/x-patch	131.3 KB
v68-0004-Introduce-the-test_ddl_deparse_regress-test-modu.patch	text/x-patch	47.4 KB
v68-0003-Support-DDL-deparse-of-the-rest-commands.patch	text/x-patch	201.3 KB
v68-0006-Support-DDL-replication.patch	text/x-patch	212.3 KB
v68-0007-Document-DDL-replication-and-DDL-deparser.patch	text/x-patch	40.6 KB

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-02-10 16:24:11
Message-ID:	CALDaNm1cOe=3uNw+st9Y_mBqsebSB6HOjNcrZy6w=vxZFj8c5A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Tue, 7 Feb 2023 at 17:35, vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Mon, 6 Feb 2023 at 17:02, vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > On Mon, 6 Feb 2023 at 06:47, Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > >
> > > Here are some comments for patch v63-0002.
> > >
> > > This is a WIP because I have not yet looked at the large file - ddl_deparse.c.
> > >
> > > ======
> > > Commit Message
> > >
> > > 1.
> > > This patch provides JSON blobs representing DDL commands, which can
> > > later be re-processed into plain strings by well-defined sprintf-like
> > > expansion. These JSON objects are intended to allow for machine-editing of
> > > the commands, by replacing certain nodes within the objects.
> > >
> > > ~
> > >
> > > "This patch provides JSON blobs" --> "This patch constructs JSON blobs"
> > >
> > > ======
> > > src/backend/commands/ddl_json.
> >
> > Modified
> >
>
> I found few issues while testing:
> Issue 1: core dump
> Steps to reproduce:
> CREATE TABLE lock_tbl1 (a BIGINT);
> CREATE TABLE lock_tbl1a (a BIGINT);
> CREATE VIEW lock_view1 AS SELECT * FROM lock_tbl1;
> CREATE VIEW lock_view2(a,b) AS SELECT * FROM lock_tbl1, lock_tbl1a;
> CREATE VIEW lock_view3 AS SELECT * from lock_view2;
> CREATE OR REPLACE VIEW lock_view2 AS SELECT * from lock_view3;
>
> Stack trace for the same:
> #5 0x00005573000128ac in ExceptionalCondition
> (conditionName=0x5573001830a3 "IsA(stmt, AlterTableStmt)",
> fileName=0x5573001821de "ddl_deparse.c", lineNumber=2840) at
> assert.c:66
> #6 0x00005572ffa8ddef in deparse_AlterRelation (cmd=0x557301038ec8)
> at ddl_deparse.c:2840
> #7 0x00005572ffaa1895 in deparse_utility_command (cmd=0x557301038ec8,
> verbose_mode=false) at ddl_deparse.c:9820
> #8 0x00005572ffd6daeb in publication_deparse_ddl_command_end
> (fcinfo=0x7fff3eba50b0) at ddltrigger.c:203
> #9 0x00005572ffaa7f87 in EventTriggerInvoke
> (fn_oid_list=0x557301033d80, trigdata=0x7fff3eba5110) at
> event_trigger.c:1047
> #10 0x00005572ffaa7769 in EventTriggerDDLCommandEnd
> (parsetree=0x557300f5b548) at event_trigger.c:719
> #11 0x00005572ffe33a22 in ProcessUtilitySlow (pstate=0x5573010458b8,
> pstmt=0x557300f5b618, queryString=0x557300f5a7c8 "CREATE OR REPLACE
> VIEW lock_view2 AS SELECT * from lock_view3;",
> context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
> dest=0x557300f5b8d8, qc=0x7fff3eba5910) at utility.c:1933

Fixed this

> Issue 2: unsupported object type error
> Steps to reproduce:
> create table t1(c1 int);
> ALTER TABLE t1 ADD CONSTRAINT onek_check_constraint CHECK (c1 >= 0);
> ALTER TABLE t1 RENAME CONSTRAINT onek_check_constraint TO
> onek_check_constraint_foo;

Fixed

> Issue 3: object name not found error
> Steps to reproduce:
> create type xfloat4;
> create function xfloat4in(cstring) returns xfloat4 immutable strict
> language internal as 'int4in';
> create function xfloat4out(xfloat4) returns cstring immutable strict
> language internal as 'int4out';
> CREATE TYPE xfloat4 (
> internallength = 16,
> input = xfloat4in,
> output = xfloat4out,
> element = int4,
> category = 'x', -- just to verify the system will take it
> preferred = true -- ditto
> );

Fixed

> Issue 4: unsupported alter table subtype 18
> Steps to reproduce:
> create type comptype as (r float8, i float8);
> create domain dcomptype as comptype;
> alter domain dcomptype add constraint c1 check ((value).r > 0);
> alter type comptype alter attribute r type bigint;

Fixed

> Issue 5: unsupported object type 13
> Steps to reproduce:
> create domain testdomain1 as int constraint unsigned check (value > 0);
> alter domain testdomain1 rename constraint unsigned to unsigned_foo;

Fixed

> Issue 6: invalid ObjTree element type
> Steps to reproduce:
> create extension file_fdw;
> CREATE FOREIGN DATA WRAPPER foo;
> alter foreign data wrapper foo HANDLER file_fdw_handler;
> WARNING: changing the foreign-data wrapper handler can change
> behavior of existing foreign tables
> ERROR: invalid ObjTree element type 1693984336

Fixed

> Issue 7: no owned sequence found
> Steps to reproduce:
> CREATE TABLE itest13 (a int);
> ALTER TABLE itest13 ADD COLUMN b int GENERATED BY DEFAULT AS IDENTITY;

This is slightly tricky, I will fix this in one of the later version

> Issue 8: could not find tuple for constraint 0
> Steps to reproduce:
> create table p1(f1 int);
> create table p1_c1() inherits(p1);
> alter table p1 add constraint inh_check_constraint1 check (f1 > 0);
> alter table p1_c1 add constraint inh_check_constraint1 check (f1 > 0);

Fixed

> Issue 9: unsupported object type 38
> Steps to reproduce:
> CREATE SUBSCRIPTION regress_testsub CONNECTION
> 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect =
> false);
> COMMENT ON SUBSCRIPTION regress_testsub IS 'test subscription';

Fixed

Fixes for this are available in the v68 version attached at [1].
[1] - https://www.postgresql.org/message-id/CALDaNm2ck3c-UDx5QfzJgMDu9rzVa-tj%2BUGrMDLWBJ020_5wvg%40mail.gmail.com

Regards,
Vignesh

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-02-10 16:31:24
Message-ID:	CALDaNm3SNOFYHYYcFNhoKwhsFAPyLjkNF5FqTwh=2W4tbZVcfg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, 10 Feb 2023 at 21:50, vignesh C <vignesh21(at)gmail(dot)com> wrote:
> The attached v68 version patch has the changes for the same.

I was not sure if we should support ddl replication of
create/alter/drop subscription commands as there might be some data
inconsistency issues in the following cases:
#node1 who is running in port 5432
create publication pub_node1 for all tables with ( PUBLISH = 'insert,
update, delete, truncate');

#node2 who is running in port 5433
create publication pub_node2 for all tables with(PUBLISH = 'insert,
update, delete, truncate', ddl = 'all');
create subscription sub_node2 connection 'dbname=postgres host=node1
port=5432' publication pub_node1;

#node3
create subscription sub_node3 connection 'dbname=postgres host=node2
port=5433 publication pub_node2;

#node1
create table t1(c1 int );

#node2
create table t1(c1 int);
alter subscription sub_node2 refresh publication;

# Additionally this command will be replicated to node3, creating a
subscription sub2_node2 in node3 which will subscribe data from node1
create subscription sub2_node2 connection 'dbname=postgres host=node1
port=5432' publication pub_node1;

After this any insert into t1 from node1 will be replicated to node2
and node3, additionally node2's replicated data(which was replicated
from node1) will also be sent to node3 causing inconsistency. If the
table has unique or primary key constraints, it will lead to an error.

Another option would be to replicate the create subscription in
disabled state and not support few ddl replication of alter
subscription which will connect to publisher like:
1) Alter subscription sub1 enable;
2) Alter subscription sub1 refresh publication;

But in this case also, we will be able to support few alter
subscription commands and not support few alter subscription commands.
I feel it is better that we do not need to support ddl replication of
create/alter/drop subscription command and let users handle the
subscription commands.
Thoughts?

Regards,
Vignesh

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-02-13 04:20:29
Message-ID:	CAAD30UKNGbDXWuG1-zs64U6G5cdUJQYf4zyJZDPMfan5pJ1rng@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, Feb 10, 2023 at 11:31 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Fri, 10 Feb 2023 at 21:50, vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > The attached v68 version patch has the changes for the same.
>
> I was not sure if we should support ddl replication of
> create/alter/drop subscription commands as there might be some data
> inconsistency issues in the following cases:
> #node1 who is running in port 5432
> create publication pub_node1 for all tables with ( PUBLISH = 'insert,
> update, delete, truncate');
>
> #node2 who is running in port 5433
> create publication pub_node2 for all tables with(PUBLISH = 'insert,
> update, delete, truncate', ddl = 'all');
> create subscription sub_node2 connection 'dbname=postgres host=node1
> port=5432' publication pub_node1;
>
> #node3
> create subscription sub_node3 connection 'dbname=postgres host=node2
> port=5433 publication pub_node2;
>
> #node1
> create table t1(c1 int );
>
> #node2
> create table t1(c1 int);
> alter subscription sub_node2 refresh publication;
>
> # Additionally this command will be replicated to node3, creating a
> subscription sub2_node2 in node3 which will subscribe data from node1
> create subscription sub2_node2 connection 'dbname=postgres host=node1
> port=5432' publication pub_node1;
>
> After this any insert into t1 from node1 will be replicated to node2
> and node3, additionally node2's replicated data(which was replicated
> from node1) will also be sent to node3 causing inconsistency. If the
> table has unique or primary key constraints, it will lead to an error.
>
> Another option would be to replicate the create subscription in
> disabled state and not support few ddl replication of alter
> subscription which will connect to publisher like:
> 1) Alter subscription sub1 enable;
> 2) Alter subscription sub1 refresh publication;

I think it will also be error-prone when the user tries to enable the
replicated subscription on node3 later on, for example, when switching
over from node2 to node3.
There is risk of duplicate or missing data on node3 if the switchover
isn't done right.

> But in this case also, we will be able to support few alter
> subscription commands and not support few alter subscription commands.
> I feel it is better that we do not need to support ddl replication of
> create/alter/drop subscription command and let users handle the
> subscription commands.

+1 for not supporting subscription commands in the first version and
letting users handle them.

Regards,
Zane

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-02-13 04:55:39
Message-ID:	CAHut+PuTsFOnj=yF4i0QzVU6JW6jaNi48sM3_Vr=K3CJb0FEOg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Sat, Feb 11, 2023 at 3:31 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Fri, 10 Feb 2023 at 21:50, vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > The attached v68 version patch has the changes for the same.
>
> I was not sure if we should support ddl replication of
> create/alter/drop subscription commands as there might be some data
> inconsistency issues in the following cases:
> #node1 who is running in port 5432
> create publication pub_node1 for all tables with ( PUBLISH = 'insert,
> update, delete, truncate');
>
> #node2 who is running in port 5433
> create publication pub_node2 for all tables with(PUBLISH = 'insert,
> update, delete, truncate', ddl = 'all');
> create subscription sub_node2 connection 'dbname=postgres host=node1
> port=5432' publication pub_node1;
>
> #node3
> create subscription sub_node3 connection 'dbname=postgres host=node2
> port=5433 publication pub_node2;
>
> #node1
> create table t1(c1 int );
>
> #node2
> create table t1(c1 int);
> alter subscription sub_node2 refresh publication;
>
> # Additionally this command will be replicated to node3, creating a
> subscription sub2_node2 in node3 which will subscribe data from node1
> create subscription sub2_node2 connection 'dbname=postgres host=node1
> port=5432' publication pub_node1;
>
> After this any insert into t1 from node1 will be replicated to node2
> and node3, additionally node2's replicated data(which was replicated
> from node1) will also be sent to node3 causing inconsistency. If the
> table has unique or primary key constraints, it will lead to an error.
>
> Another option would be to replicate the create subscription in
> disabled state and not support few ddl replication of alter
> subscription which will connect to publisher like:
> 1) Alter subscription sub1 enable;
> 2) Alter subscription sub1 refresh publication;
>
> But in this case also, we will be able to support few alter
> subscription commands and not support few alter subscription commands.
> I feel it is better that we do not need to support ddl replication of
> create/alter/drop subscription command and let users handle the
> subscription commands.
> Thoughts?
>

So essentially, node3 is subscribed 2x from the same table in node1

node1 --> node2
| | ddl
| V
+---------> node3

I think the suggested options are:

option #1. If you support CREATE SUBSCRIPTION DDL replication then you
can end up with the conflict troubles you described above

option #2. If you support CREATE SUBSCRIPTION DDL replication but only
make sure it is disabled first then your above scenario might be OK
but you will also need to *not* allow DDL replication of the ALTER
SUBSCRIPTION which might cause it to become re-enabled. IMO adding
tricky rules is just inviting more problems.

option #3. Do nothing, don't support it. +1 but see below for a
variation of this

Actually, I am sort of expecting lots of potential difficulties with
DDL replication and this CREATE SUBSCRIPTION problem is just one of
them. IMO it would be a mistake to try and make the first version of
these patches try to do *everything*. E.g. Why invent tricky solutions
to problems without yet knowing user expectations/requirements?

Therefore, my first impression is to do a generic option #4:

option #4. This is a variation of "do nothing". My suggestion is you
can still replicate every DDL via logical replication messages but
just don't actually *apply* anything on the subscriber side for the
commands which are problematic (like this one is). Instead of
applying, the subscriber can just log a NOTICE about each command that
was skipped. This will make it easier for the user to know what didn’t
happen, but they can just cut/paste that command from the NOTICE if
they really want to. Also, this option #4 is deliberately generic,
which means you can do the same for every kind of DDL that proves too
difficult to automate in the first version (I doubt CREATE
SUBSCRIPTION will be the only example).

The option #4 result might look like this:

----
test_sub=# create subscription sub1 connection 'dbname=test_pub'
publication pub1;
NOTICE: created replication slot "sub1" on publisher CREATE SUBSCRIPTION
NOTICE: subscription "sub1" skipping DDL: create subscription
sub_node2 connection 'dbname=postgres host=node1 port=5432'
publication pub_node1;
...
----

(And if it turns out users really do want this then it can be
revisited in later patch versions)

------
Kind Regards,
Peter Smith.
Fujitsu Australia

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-02-13 09:07:52
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On 2023-Feb-06, Peter Smith wrote:

> I thought this comment was analogous to another one from this same
> patch 0001 (see seclabel.c), so the suggested change above was simply
> to make the wording consistent.
>
> @@ -134,6 +134,9 @@ ExecSecLabelStmt(SecLabelStmt *stmt)
> (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> errmsg("must specify provider when multiple security label providers
> have been loaded")));
> provider = (LabelProvider *) linitial(label_provider_list);
> +
> + /* Copy the provider name to the parsetree, needed for DDL deparsing
> of SecLabelStmt */
> + stmt->provider = pstrdup(provider->provider_name);
>
> So if the suggestion for the ExecuteGrantStmt comment was a mistake
> then perhaps the ExecSecLabelStmt comment is wrong also?

Well, here the patch would have us modifying a parse tree node, which is
probably not a good thing to do. I don't remember whether I coded the
deparsing of any other object type this way, but nowadays modifying
parse trees is generally frowned upon. Most likely, this would have to
be done some other way. Maybe set the provider as secondary object
address for the command.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"La vida es para el que se aventura"

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-02-14 01:43:50
Message-ID:	CAHut+Pv+MdX7TxJFa1D7PZrgAD6m9r0y4p4Zk+URH6m6ijmH7Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

FYI - the latest patch cannot be applied.

See cfbot [1]

------
[1] http://cfbot.cputube.org/patch_42_3595.log

Kind Regards,
Peter Smith.
Fujitsu Australia

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-02-14 09:27:26
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On 2023-Feb-14, Peter Smith wrote:

> FYI - the latest patch cannot be applied.

Yeah, that's because I applied the removal of "ON" to transforms'
identity string.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/

From:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	RE: Support logical replication of DDLs
Date:	2023-02-14 10:28:23
Message-ID:	OS0PR01MB57161EE6D2114EF14B3439E794A29@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Tuesday, February 14, 2023 9:44 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> FYI - the latest patch cannot be applied.
>

Thanks for reporting. I will post a rebased patch after fixing some of the
comments raised so far(in a day or so).

Best regards,
Hou zj

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Ajin Cherian <itsajin(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-02-14 11:02:13
Message-ID:	CAA4eK1+pdyQoYB4R5rzrxZjz2dNWW1p2iqAj7J9qWeTvKDyBiQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, Feb 10, 2023 at 4:36 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, Feb 9, 2023 at 3:25 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
> >
>
> Comments on 0001 and 0002
> =======================
>

Few more comments on 0001 and 0003
===============================
1. I think having 'internal' in an exposed function
pg_get_viewdef_internal() seems a bit odd to me. Shall we name it
something like pg_get_viewdef_sys() as it consults the system cache?

2. In pg_get_trigger_whenclause(), there are various things that have
changed in the new code but the patch didn't update those. It is
important to update those especially because it replaces the existing
code as well. For example, it should use GET_PRETTY_FLAGS for
prettyFlags, then some variables are not initialized, and also didn't
use rellockmode for old and new rtes. I suggest carefully comparing
the code with the corresponding existing code in the function
pg_get_triggerdef_worker().

3.
deparse_CreateTrigStmt
{
...
+
+ if (node->deferrable)
+ list = lappend(list, new_string_object("DEFERRABLE"));
+ if (node->initdeferred)
+ list = lappend(list, new_string_object("INITIALLY DEFERRED"));
+ append_array_object(ret, "%{constraint_attrs: }s", list);
...
}

Is there a reason that the above part of the conditions doesn't match
the below conditions in pg_get_triggerdef_worker()?
pg_get_triggerdef_worker()
{
...
if (!trigrec->tgdeferrable)
appendStringInfoString(&buf, "NOT ");
appendStringInfoString(&buf, "DEFERRABLE INITIALLY ");
if (trigrec->tginitdeferred)
appendStringInfoString(&buf, "DEFERRED ");
else
appendStringInfoString(&buf, "IMMEDIATE ");
...
}

4. In deparse_CreateTrigStmt(), the handling for REFERENCING OLD/NEW
Table is missing. See the corresponding code in
pg_get_triggerdef_worker().

5. In deparse_CreateTrigStmt(), the function name for EXECUTE
PROCEDURE is generated in a different way as compared to what we are
doing in pg_get_triggerdef_worker(). Is there a reason for the same?

6.
+char *
+pg_get_partkeydef_simple(Oid relid)
+{
+ return pg_get_partkeydef_worker(relid, 0, false, false);
+}

The 0 is not a valid value for prettyFlags, so not sure what is the
intention here. I think you need to use GET_PRETTY_FLAGS() here.

7. The above comment applies to pg_get_constraintdef_command_simple() as well.

8. Can we think of better names instead of appending 'simple' in the
above two cases?

--
With Regards,
Amit Kapila.

From:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	RE: Support logical replication of DDLs
Date:	2023-02-15 03:01:37
Message-ID:	OS0PR01MB57168C9B93D7B2DC4D72F8EF94A39@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Friday, February 10, 2023 7:07 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, Feb 9, 2023 at 3:25 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
> >
>
> Comments on 0001 and 0002
> =======================
> 1.
> * CREATE COLLATION
> */
> ObjectAddress
> -DefineCollation(ParseState *pstate, List *names, List *parameters,
> bool if_not_exists)
> +DefineCollation(ParseState *pstate, List *names, List *parameters,
> + bool if_not_exists, ObjectAddress *from_existing_collid)
>
> I think it is better to expand function header comments to explain
> return values especially from_existing_collid.

Added.

> 2.
> +Form_pg_sequence_data
> +get_sequence_values(Oid sequenceId)
> +{
> + Buffer buf;
> + SeqTable elm;
> + Relation seqrel;
> + HeapTupleData seqtuple;
> + Form_pg_sequence_data seq;
> + Form_pg_sequence_data retSeq =
> palloc(sizeof(FormData_pg_sequence_data));
> +
> + /* Open and AccessShareLock sequence */
> + init_sequence(sequenceId, &elm, &seqrel);
>
> The comment above init_sequence() seems wrong to me. AFAICS, we
> acquire RowExclusiveLock in init_sequence() via
> lock_and_open_sequence(). Am, I missing anything?

Changed.

> 3.
> +get_sequence_values(Oid sequenceId)
> ...
> ...
> +
> + if (pg_class_aclcheck(sequenceId, GetUserId(),
> + ACL_SELECT | ACL_UPDATE | ACL_USAGE) != ACLCHECK_OK)
> + ereport(ERROR,
> + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
>
> Why do we need to check UPDATE privilege just for reading the values?

I think it was mis-copied, removed.

>
> 4. I checked the callers of get_sequence_values and they just need
> 'last_val' but we still expose all values from Form_pg_sequence_data,
> not sure if that is required. In deparse_CreateSeqStmt(), we use it to
> append RESTART but the CREATE SEQUENCE syntax in docs doesn't have a
> RESTART clause, so I am confused as to why the patch appends the same.
> If it is really required then please add the comments for the same.

From the code, it seems CREATE SEQUENCE supports the RESTART keyword while the
document doesn’t mention it. I think I will start a separate thread to discuss
this.

>
> 5. In deparse_CreateSeqStmt() and deparse_ColumnIdentity(), we open
> SequenceRelationId, is that really required? Isn't locking the
> sequence relation sufficient as is done by get_sequence_values()?
> Also, I see that deparse_CreateSeqStmt() opens and locks the sequence
> whereas deparse_ColumnIdentity() doesn't do the same. Then, we unlock
> the relation in some cases but not in others (like get_sequence_values
> uses NoLock whereas others release the lock while closing the rel).
>
> 6. In get_sequence_values(), we check the permissions whereas callers
> just assume that it is done and don't check it while accessing the
> sequence. This makes the code a bit unclear.
>
> 7. After seeing the above inconsistencies, I am thinking will it be
> better to design get_sequence_values() such that it returns both
> sequence parameters and last_val in a structure and the callers use
> it. That would bit clean and avoid opening the relation multiple
> times.

Agreed. Refactored this as suggested.

>
> 8.
> +/*
> + * Return the given object type as a string.
> + * If isgrant is true, then this function is called
> + * while deparsing GRANT statement and some object
> + * names are replaced.
> + */
> +const char *
> +stringify_objtype(ObjectType objtype, bool isgrant)
>
> Have an empty line after the Return line. The line length appears too short.

Adjusted.

>
> 9. Similar to stringify_grant_objtype() and
> stringify_adefprivs_objtype(), shall we keep the list of all
> unsupported types in stringify_objtype()? That will help us to easily
> identify which objects are yet not supported.

Added.

>
> 10. In pg_get_ruledef_detailed(), the code to form a string for qual
> and action is mostly the same as what we have in make_ruledef(). I
> think we can extract the common code into a separate function to avoid
> missing the code updates at one of the places. I see that
> 'special_exprkind' is present in one place and not in other, it may be
> that over time, we have added new things to deparse_context which
> doesn't get updated in the patch. Also, I noticed that for
> CMD_NOTHING, the patch just ignores the action whereas the core code
> does append the definition. We should check whether such a difference
> is required and if so, then add comments for the same.

I extracted the command code into a separate function as suggested,
and fixed these inconsistences.

Here is the new version patch which addressed above comments.
I also fixed a bug for the deparsing of CREATE RULE that it didn't add
parentheses for rule action list.

And thanks for Vignesh to help addressing the comments.

Best Regards,
Hou zj

Attachment	Content-Type	Size
v69-0007-Document-DDL-replication-and-DDL-deparser.patch	application/octet-stream	40.6 KB
v69-0003-Support-DDL-deparse-of-the-rest-commands.patch	application/octet-stream	200.9 KB
v69-0002-Functions-to-deparse-Table-DDL-commands.patch	application/octet-stream	131.3 KB
v69-0001-Infrastructure-to-support-DDL-deparsing.patch	application/octet-stream	40.8 KB
v69-0004-Introduce-the-test_ddl_deparse_regress-test-modu.patch	application/octet-stream	47.4 KB
v69-0005-DDL-messaging-infrastructure-for-DDL-replication.patch	application/octet-stream	41.6 KB
v69-0006-Support-DDL-replication.patch	application/octet-stream	212.6 KB

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-02-15 04:00:26
Message-ID:	CAHut+PumBJFvdBOH0baPd-sSKxhvMG7=04GP9pH3LpO4WadX7A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Sat, Feb 11, 2023 at 3:21 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Thu, 9 Feb 2023 at 03:47, Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
> > Hi Vignesh, thanks for addressing my v63-0002 review comments.
> >
> > I confirmed most of the changes. Below is a quick follow-up for the
> > remaining ones.
> >
> > On Mon, Feb 6, 2023 at 10:32 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > >
> > > On Mon, 6 Feb 2023 at 06:47, Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > > >
> > ...
> > > >
> > > > 8.
> > > > + value = findJsonbValueFromContainer(container, JB_FOBJECT, &key);
> > > >
> > > > Should the code be checking or asserting value is not NULL?
> > > >
> > > > (IIRC I asked this a long time ago - sorry if it was already answered)
> > > >
> > >
> > > Yes, this was already answered by Zheng, quoting as "The null checking
> > > for value is done in the upcoming call of expand_one_jsonb_element()."
> > > in [1]
> >
> > Thanks for the info. I saw that Zheng-san only wrote it is handled in
> > the “upcoming call of expand_one_jsonb_element”, but I don’t know if
> > that is sufficient. For example, if the execution heads down the other
> > path (expand_jsonb_array) with a NULL jsonarr then it going to crash,
> > isn't it? So I still think some change may be needed here.
>
> Added an Assert for this.
>

Was this a correct change to make here?

IIUC this Assert is now going to intercept both cases including the
expand_one_jsonb_element() which previously would have thrown a proper
ERROR.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	Ajin Cherian <itsajin(at)gmail(dot)com>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-02-15 04:32:53
Message-ID:	CAHut+PusvDXTJa2drUPzePBBW0i9we8yns8Z4DU4b7JVs6Ew1Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Thu, Feb 9, 2023 at 8:55 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
> On Fri, Feb 3, 2023 at 11:41 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
> > Here are some review comments for patch v63-0001.
> >

> > ======
> > src/backend/catalog/aclchk.c
> >
> > 3. ExecuteGrantStmt
> >
> > + /* Copy the grantor id needed for DDL deparsing of Grant */
> > + istmt.grantor_uid = grantor;
> > +
> >
> > SUGGESTION (comment)
> > Copy the grantor id to the parsetree, needed for DDL deparsing of Grant
> >
>
> didn't change this, as Alvaro said this was not a parsetree.

Perhaps there is more to do here? Sorry, I did not understand the
details of Alvaro's post [1], but I did not recognize the difference
between ExecuteGrantStmt and ExecSecLabelStmt so it was my impression
either one or both of these places are either wrongly commented, or
maybe are doing something that should not be done.

> > ======
> > src/backend/utils/adt/regproc.c
> >
> > 9.
> > +
> > +/*
> > + * Append the parenthesized arguments of the given pg_proc row into the output
> > + * buffer. force_qualify indicates whether to schema-qualify type names
> > + * regardless of visibility.
> > + */
> > +static void
> > +format_procedure_args_internal(Form_pg_proc procform, StringInfo buf,
> > + bool force_qualify)
> > +{
> > + int i;
> > + char* (*func[2])(Oid) = {format_type_be, format_type_be_qualified};
> > +
> > + appendStringInfoChar(buf, '(');
> > + for (i = 0; i < procform->pronargs; i++)
> > + {
> > + Oid thisargtype = procform->proargtypes.values[i];
> > + char *argtype = NULL;
> > +
> > + if (i > 0)
> > + appendStringInfoChar(buf, ',');
> > +
> > + argtype = func[force_qualify](thisargtype);
> > + appendStringInfoString(buf, argtype);
> > + pfree(argtype);
> > + }
> > + appendStringInfoChar(buf, ')');
> > +}
> >
> > 9a.
> > Assign argtype = NULL looks redundant because it will always be
> > overwritten anyhow.
> >
>
> changed this.
>
> > ~
> >
> > 9b.
> > I understand why this function was put here beside the other static
> > functions in "Support Routines" but IMO it really belongs nearby (i.e.
> > directly above) the only caller (format_procedure_args). Keeping both
> > those functional together will improve the readability of both, and
> > will also remove the need to have the static forward declaration.
> >

There was no reply for 9b. Was it accidentally overlooked, or just
chose not to do it?

------
[1] https://www.postgresql.org/message-id/20230213090752.27ftbb6byiw3qcbl%40alvherre.pgsql

Kind Regards,
Peter Smith.
Fujitsu Australia

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-02-15 05:57:04
Message-ID:	CAA4eK1J2voRVoYBB=r4xtdzYTSPX7RnTcvXyYLk031YE6gWxKg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, Feb 10, 2023 at 8:23 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Thu, Feb 9, 2023 at 6:55 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
> >
> (v67)
>
> I have some questions about adding the infrastructure for DDL deparsing.
>
> Apart from the changes made by 0001 patch to add infrastructure for
> DDL deparsing, 0002 patch seems to add some variables that are not
> used in 0002 patch:
>
> @@ -2055,6 +2055,7 @@ typedef struct AlterTableStmt
> List *cmds; /* list of subcommands */
> ObjectType objtype; /* type of object */
> bool missing_ok; /* skip error if table
> missing */
> + bool table_like; /* internally generated for
> TableLikeClause */
> } AlterTableStmt;
>
> @@ -39,6 +40,7 @@ typedef struct CollectedATSubcmd
> {
> ObjectAddress address; /* affected column,
> constraint, index, ... */
> Node *parsetree;
> + char *usingexpr;
> } CollectedATSubcmd;
>
> typedef struct CollectedCommand
> @@ -62,6 +64,7 @@ typedef struct CollectedCommand
> {
> Oid objectId;
> Oid classId;
> + bool rewrite;
> List *subcmds;
> } alterTable;
>
> These three variables are used in 0006 patch.
>

Hmm, then it should be better to move these to 0006 patch.

> Looking at 0006 patch (Support DDL replication), it seems to me that
> it includes not only DDL replication support but also changes for the
> event trigger. For instance, the patch adds
> EventTriggerAlterTypeStart() and EventTriggerAlterTypeEnd(). If these
> changes are required for DDL deparse, should we include them in 0001
> patch? Perhaps the same is true for
> EventTriggerCollectCreatePublication() and friends. IIUC the DDL
> deparse and DDL replication are independent features, so I think 0006
> patch should not include any changes for DDL deparse infrastructure.
>

AFAICS, these are required for DDL replication, so not sure moving
them would be of any help.

> Also, 0003 and 0006 patches introduce SCT_Create/AlterPublication and
> change DDL deparse so that it deparse CREATE/ALTER PUBLICATION in a
> different way from other simple commands. Is there any reason for
> that? I mean, since EventTriggerCollectCreatePublication() collects
> the information from the parse tree, I wonder if we could use
> SCT_Simple for them.
>

Right, I think we should try a bit harder to avoid adding new
CollectedCommandTypes. Is there a reason that we can't retrieve the
additional information required to form the command string from system
catalogs or parsetree?

--
With Regards,
Amit Kapila.

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-02-15 06:50:59
Message-ID:	CAA4eK1J+bh+p=gQXOoiKCAQE7VUU3QD-As2TXaL3oKOjBvY3JA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, Feb 10, 2023 at 10:01 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Fri, 10 Feb 2023 at 21:50, vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > The attached v68 version patch has the changes for the same.
>
> I was not sure if we should support ddl replication of
> create/alter/drop subscription commands as there might be some data
> inconsistency issues in the following cases:
> #node1 who is running in port 5432
> create publication pub_node1 for all tables with ( PUBLISH = 'insert,
> update, delete, truncate');
>
> #node2 who is running in port 5433
> create publication pub_node2 for all tables with(PUBLISH = 'insert,
> update, delete, truncate', ddl = 'all');
> create subscription sub_node2 connection 'dbname=postgres host=node1
> port=5432' publication pub_node1;
>
> #node3
> create subscription sub_node3 connection 'dbname=postgres host=node2
> port=5433 publication pub_node2;
>
> #node1
> create table t1(c1 int );
>
> #node2
> create table t1(c1 int);
> alter subscription sub_node2 refresh publication;
>
> # Additionally this command will be replicated to node3, creating a
> subscription sub2_node2 in node3 which will subscribe data from node1
> create subscription sub2_node2 connection 'dbname=postgres host=node1
> port=5432' publication pub_node1;
>
> After this any insert into t1 from node1 will be replicated to node2
> and node3, additionally node2's replicated data(which was replicated
> from node1) will also be sent to node3 causing inconsistency. If the
> table has unique or primary key constraints, it will lead to an error.
>

Can't one use origin = none here to avoid errors?

--
With Regards,
Amit Kapila.

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-02-15 08:32:01
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On 2023-Feb-15, Peter Smith wrote:

> On Thu, Feb 9, 2023 at 8:55 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
> >
> > On Fri, Feb 3, 2023 at 11:41 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:

> > > 3. ExecuteGrantStmt
> > >
> > > + /* Copy the grantor id needed for DDL deparsing of Grant */
> > > + istmt.grantor_uid = grantor;
> > >
> > > SUGGESTION (comment)
> > > Copy the grantor id to the parsetree, needed for DDL deparsing of Grant
> >
> > didn't change this, as Alvaro said this was not a parsetree.
>
> Perhaps there is more to do here? Sorry, I did not understand the
> details of Alvaro's post [1], but I did not recognize the difference
> between ExecuteGrantStmt and ExecSecLabelStmt so it was my impression
> either one or both of these places are either wrongly commented, or
> maybe are doing something that should not be done.

These two cases are different. In ExecGrantStmt we're adding the
grantor OID to the InternalGrant struct, which is not a parse node, so
there's no strong reason not to modify it (and also the suggested
comment change is factually wrong). I don't know if the idea is great,
but at least I see no strong objection.

In the other case, as I said in [1], the patch proposes to edit the
parse node to add the grantor, but I think a better idea might be to
change the signature to
ExecSecLabelStmt(SecLabelStmt *stmt, ObjectAddress *provider) so that
the function can set the provider there; and caller passes
&secondaryObject, which is the method we adopted for this kind of thing.

[1] https://postgr.es/m/[email protected]

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-02-15 09:51:26
Message-ID:	CAA4eK1+fCfeg02b4Uz2if=EbtuPZ56BC39d-+dVMP_y7P8Qv2g@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Wed, Feb 15, 2023 at 2:02 PM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>
> On 2023-Feb-15, Peter Smith wrote:
>
> > On Thu, Feb 9, 2023 at 8:55 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
> > >
> > > On Fri, Feb 3, 2023 at 11:41 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> > > > 3. ExecuteGrantStmt
> > > >
> > > > + /* Copy the grantor id needed for DDL deparsing of Grant */
> > > > + istmt.grantor_uid = grantor;
> > > >
> > > > SUGGESTION (comment)
> > > > Copy the grantor id to the parsetree, needed for DDL deparsing of Grant
> > >
> > > didn't change this, as Alvaro said this was not a parsetree.
> >
> > Perhaps there is more to do here? Sorry, I did not understand the
> > details of Alvaro's post [1], but I did not recognize the difference
> > between ExecuteGrantStmt and ExecSecLabelStmt so it was my impression
> > either one or both of these places are either wrongly commented, or
> > maybe are doing something that should not be done.
>
> These two cases are different. In ExecGrantStmt we're adding the
> grantor OID to the InternalGrant struct, which is not a parse node, so
> there's no strong reason not to modify it (and also the suggested
> comment change is factually wrong). I don't know if the idea is great,
> but at least I see no strong objection.
>
> In the other case, as I said in [1], the patch proposes to edit the
> parse node to add the grantor, but I think a better idea might be to
> change the signature to
> ExecSecLabelStmt(SecLabelStmt *stmt, ObjectAddress *provider) so that
> the function can set the provider there; and caller passes
> &secondaryObject, which is the method we adopted for this kind of thing.
>

+1, that is a better approach to make the required change in ExecSecLabelStmt().

--
With Regards,
Amit Kapila.

From:	Zheng Li <zhengli10(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, li jie <ggysxcq(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, rajesh singarapu <rajesh(dot)rs0541(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Support logical replication of DDLs
Date:	2023-02-15 22:16:05
Message-ID:	CAAD30UL7nez+umBSoj8g8gSP6YK-c_HaT_+MO0+jvW70FAO4MA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

We have not discussed much about the ownership of replicated objects.
Currently, replicated
objects belong to the subscription owner. However, it makes sense to
a