postgres-xc-developers Mailing List for Postgres-XC

Brought to you by: ahsanhadi, amitdkhan, ashutoshbapat, gabbasb, and 3 others

postgres-xc-developers — Postgres-XC hackers and developers

You can subscribe to this list here.

2010	Jan	Feb	Mar	Apr (10)	May (17)	Jun (3)	Jul	Aug	Sep (8)	Oct (18)	Nov (51)	Dec (74)
2011	Jan (47)	Feb (44)	Mar (44)	Apr (102)	May (35)	Jun (25)	Jul (56)	Aug (69)	Sep (32)	Oct (37)	Nov (31)	Dec (16)
2012	Jan (34)	Feb (127)	Mar (218)	Apr (252)	May (80)	Jun (137)	Jul (205)	Aug (159)	Sep (35)	Oct (50)	Nov (82)	Dec (52)
2013	Jan (107)	Feb (159)	Mar (118)	Apr (163)	May (151)	Jun (89)	Jul (106)	Aug (177)	Sep (49)	Oct (63)	Nov (46)	Dec (7)
2014	Jan (65)	Feb (128)	Mar (40)	Apr (11)	May (4)	Jun (8)	Jul (16)	Aug (11)	Sep (4)	Oct (1)	Nov (5)	Dec (16)
2015	Jan (5)	Feb	Mar (2)	Apr (5)	May (4)	Jun (12)	Jul	Aug	Sep	Oct	Nov	Dec (4)
2019	Jan	Feb	Mar	Apr	May	Jun	Jul (2)	Aug	Sep	Oct	Nov	Dec

S	M	T	W	T	F	S
						1 (5)
2	3 (6)	4 (5)	5 (10)	6 (5)	7 (2)	8 (2)
9	10	11 (4)	12 (13)	13 (13)	14 (29)	15
16	17 (5)	18 (2)	19 (5)	20 (7)	21 (1)	22
23	24 (1)	25 (2)	26	27 (6)	28 (5)

Flat | Threaded

Re: [Postgres-xc-developers] connection to server was lost

From: David E. W. <da...@ju...> - 2014-02-28 21:17:16

On Feb 27, 2014, at 7:48 PM, Ashutosh Bapat <ash...@en...> wrote:

> It might be due to the large amount of data sent from the datanode to the connector. When you see message "connection to client lost" at datanode, it means that the connection to the coordinator was lost. In XC, coordinators act as client to the datanode. Further, no message in the coordinator log implies that there wasn't any segfault or error on the coordinator which can result in loosing client (to the datanode). One way to verify this is to check what happens for smaller amounts of the data. There is still some code in executor, which saves data from datanode in a linked list and because of large amount of data that process runs out of memory. You may find something in system logs, if that is true.

Ah ha. Now that I pay more attention to the statement in the log, I see what the problem is. That's a full table scan it's doing, on a very large table. I think the planner is making a mistake. The query I'm running is far more complicated than that bit in the log. Really, the full query should be able to run on each node, and the results aggregated on the coordinator. I suspect I need to add some more JOIN clauses to make sure that the planner better knows how to run the query on each node.

> Please do the following,
> Run explain verbose on the query which showed this behavior and in that output you will find what query is being sent to the datanode

So I did this, but even with the EXPLAIN VERBOSE I got the disconnect error. With plain EXPLAIN, too. The query should not run without ANALZYE, right? This is 1.1, BTW.

> Reduce your data on the datanode such that, that particular query returns may be a few thousand rows to the coordinator. BTW, I have seen millions of rows being exchanged between the coordinator and datanode without problem. But still there is a case where large data would be a problem.
> Now, see if the query runs without problem.

I updated my query to make sure that I was joining on partitioned columns, thinking that would get the queries to run more on the data nodes, but it made no difference. I still got an error for a table scan on a very large table. :-(

David

Re: [Postgres-xc-developers] ERROR: unexpected varno 6 in JOIN RTE 5

From: David E. W. <da...@ju...> - 2014-02-28 17:06:19

On Feb 27, 2014, at 10:51 PM, 鈴木 幸市 <ko...@in...> wrote:

> Sorry, it has not been made open.    Similar to PG, I’m planning to release 1.0.3 and 1.1.1 after 1.2GA is out.   1.2GA will be out when replicated table update/delete is fixed.

Ah-ha, okay, thank you for the clarification.

>   I’m planning to include the following fixes in all the major/minor releases.
> 
> 1. GTM Proxy fix to handle disconnection from backends:  this caused “snapshot not available” error for heavy workloads.
> 2. Fix restriction to use temporary objects.
> 3. Fix statement cancellation error to improve random failure in the regression test.

These sound like useful improvements. The temporary object fix will be particularly welcome.

Can you tell me where savepoint support is in your roadmap?

Thanks,

David

Re: [Postgres-xc-developers] ERROR: unexpected varno 6 in JOIN RTE 5

From: 鈴木幸市 <ko...@in...> - 2014-02-28 06:52:01

Sorry, it has not been made open.    Similar to PG, I’m planning to release 1.0.3 and 1.1.1 after 1.2GA is out.   1.2GA will be out when replicated table update/delete is fixed.   I’m planning to include the following fixes in all the major/minor releases.

1. GTM Proxy fix to handle disconnection from backends:  this caused “snapshot not available” error for heavy workloads.
2. Fix restriction to use temporary objects.
3. Fix statement cancellation error to improve random failure in the regression test.

Regards;
---
Koichi Suzuki

2014/02/28 8:33、David E. Wheeler <da...@ju...> のメール：

> On Feb 27, 2014, at 3:29 PM, Koichi Suzuki <koi...@gm...> wrote:
> 
>> It is not correct.  1.0  and 1.1 are maintained.    Fix of the issue may not be easy though.
> 
> So, maintained, but no plan to release new versions? Sure, some bugs may be too much trouble for a maintained branch, but as Abbas says there are no plans to release new versions, it sounds an awful lot to me like no issues will be fixed, at least not in a release.
> 
> Best,
> 
> David
> ------------------------------------------------------------------------------
> Flow-based real-time traffic analytics software. Cisco certified tool.
> Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
> Customize your own dashboards, set traffic alerts and generate reports.
> Network behavioral analysis & security monitoring. All-in-one tool.
> http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
> _______________________________________________
> Postgres-xc-developers mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>

Re: [Postgres-xc-developers] connection to server was lost

From: Ashutosh B. <ash...@en...> - 2014-02-28 03:48:40

Hi David,
It might be due to the large amount of data sent from the datanode to the
connector. When you see message "connection to client lost" at datanode, it
means that the connection to the coordinator was lost. In XC, coordinators
act as client to the datanode. Further, no message in the coordinator log
implies that there wasn't any segfault or error on the coordinator which
can result in loosing client (to the datanode). One way to verify this is
to check what happens for smaller amounts of the data. There is still some
code in executor, which saves data from datanode in a linked list and
because of large amount of data that process runs out of memory. You may
find something in system logs, if that is true.

Please do the following,
Run explain verbose on the query which showed this behavior and in that
output you will find what query is being sent to the datanode
Reduce your data on the datanode such that, that particular query returns
may be a few thousand rows to the coordinator. BTW, I have seen millions of
rows being exchanged between the coordinator and datanode without problem.
But still there is a case where large data would be a problem.
Now, see if the query runs without problem.



On Fri, Feb 28, 2014 at 6:23 AM, David E. Wheeler <da...@ju...>wrote:

> PGXC Hakers,
>
> I have finally loaded up my testing PGXC four-node cluster with a nice
> beefy database similar to a PostgreSQL database we use for long-running
> reporting queries. I gathered up one of our slower-running queries (26.6m
> run) and ran it on XC. Alas, after a while, it died with this error:
>
>     psql:slow.sql:73: connection to server was lost
>
> The coordinator log was not much help: nothing was logged. So I trolled
> through the logs on the data nodes. All four had these messages:
>
> > 2014-02-27 15:45:51 PST dwheeler 10.4.34.1(56968) 22213 530fc838.56c5
> LOG:  could not send data to client: Connection reset by peer
> > 2014-02-27 15:45:51 PST dwheeler 10.4.34.1(56968) 22213 530fc838.56c5
> STATEMENT:  SELECT subscriber_cmd_id, rule_reason, rule_score, txn_uuid,
> txn_timestamp_utc FROM ONLY subscriber_482900.transactions_rule tma WHERE
> (subscriber_id = 482900)
> > 2014-02-27 15:45:51 PST dwheeler 10.4.34.1(56968) 22213 530fc838.56c5
> FATAL:  connection to client lost
> > 2014-02-27 15:45:51 PST dwheeler 10.4.34.1(56968) 22213 530fc838.56c5
> STATEMENT:  SELECT subscriber_cmd_id, rule_reason, rule_score, txn_uuid,
> txn_timestamp_utc FROM ONLY subscriber_482900.transactions_rule tma WHERE
> (subscriber_id = 482900)
>
> No reason given for the dropped connection. I ran the query on the
> coordinator box, so psql should have connected via a socket rather than
> TCP. Out of curiosity, I looked at the logs for the other three
> coordinators. None had any error messages, either.
>
> So, no idea what's timing out; statement_timeout is set to 0. Here are the
> settings from my coordinator's postgreql.conf:
>
>     max_connections = 100
>     shared_buffers = 32MB
>     log_destination = 'stderr'
>     logging_collector = on
>     log_directory = 'pg_log'
>     log_filename = 'postgresql-%a.log'
>     log_truncate_on_rotation = on
>     log_rotation_age = 1d
>     log_rotation_size = 0
>     log_line_prefix = '< %m >'
>     log_timezone = 'US/Pacific'
>     datestyle = 'iso, mdy'
>     timezone = 'US/Pacific'
>     lc_messages = 'en_US.UTF-8'
>     lc_monetary = 'en_US.UTF-8'
>     lc_numeric = 'en_US.UTF-8'
>     lc_time = 'en_US.UTF-8'
>     default_text_search_config = 'pg_catalog.english'
>     pgxc_node_name = 'node1'
>     port = 5432
>     listen_addresses = '*'
>     shared_buffers = 250MB
>     work_mem = 128MB
>     maintenance_work_mem = 128MB
>     effective_cache_size = 8GB
>     log_line_prefix = '%t %u %r %p %c '
>     timezone = 'UTC'
>     gtm_host = 'node1.example.com'
>
> And from one of the data nodes (only the names differ on the others):
>
>     max_connections = 100
>     shared_buffers = 32MB
>     log_destination = 'stderr'
>     logging_collector = on
>     log_directory = 'pg_log'
>     log_filename = 'postgresql-%a.log'
>     log_truncate_on_rotation = on
>     log_rotation_age = 1d
>     log_rotation_size = 0
>     log_line_prefix = '< %m >'
>     log_timezone = 'US/Pacific'
>     datestyle = 'iso, mdy'
>     timezone = 'US/Pacific'
>     lc_messages = 'en_US.UTF-8'
>     lc_monetary = 'en_US.UTF-8'
>     lc_numeric = 'en_US.UTF-8'
>     lc_time = 'en_US.UTF-8'
>     default_text_search_config = 'pg_catalog.english'
>     pgxc_node_name = 'node1'
>     port = 15432
>     listen_addresses = '*'
>     shared_buffers = 750MB
>     work_mem = 128MB
>     maintenance_work_mem = 128MB
>     effective_cache_size = 23GB
>     log_line_prefix = '%t %u %r %p %c '
>     timezone = 'UTC'
>     gtm_host = 'node1.iovationnp.com'
>
> Thoughts? What could be timing out?
>
> Thanks,
>
> David
>
>
>
>
> ------------------------------------------------------------------------------
> Flow-based real-time traffic analytics software. Cisco certified tool.
> Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
> Customize your own dashboards, set traffic alerts and generate reports.
> Network behavioral analysis & security monitoring. All-in-one tool.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
> _______________________________________________
> Postgres-xc-developers mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

[Postgres-xc-developers] connection to server was lost

From: David E. W. <da...@ju...> - 2014-02-28 00:53:22

PGXC Hakers,

I have finally loaded up my testing PGXC four-node cluster with a nice beefy database similar to a PostgreSQL database we use for long-running reporting queries. I gathered up one of our slower-running queries (26.6m run) and ran it on XC. Alas, after a while, it died with this error:

    psql:slow.sql:73: connection to server was lost

The coordinator log was not much help: nothing was logged. So I trolled through the logs on the data nodes. All four had these messages:

> 2014-02-27 15:45:51 PST dwheeler 10.4.34.1(56968) 22213 530fc838.56c5 LOG:  could not send data to client: Connection reset by peer
> 2014-02-27 15:45:51 PST dwheeler 10.4.34.1(56968) 22213 530fc838.56c5 STATEMENT:  SELECT subscriber_cmd_id, rule_reason, rule_score, txn_uuid, txn_timestamp_utc FROM ONLY subscriber_482900.transactions_rule tma WHERE (subscriber_id = 482900)
> 2014-02-27 15:45:51 PST dwheeler 10.4.34.1(56968) 22213 530fc838.56c5 FATAL:  connection to client lost
> 2014-02-27 15:45:51 PST dwheeler 10.4.34.1(56968) 22213 530fc838.56c5 STATEMENT:  SELECT subscriber_cmd_id, rule_reason, rule_score, txn_uuid, txn_timestamp_utc FROM ONLY subscriber_482900.transactions_rule tma WHERE (subscriber_id = 482900)

No reason given for the dropped connection. I ran the query on the coordinator box, so psql should have connected via a socket rather than TCP. Out of curiosity, I looked at the logs for the other three coordinators. None had any error messages, either.

So, no idea what’s timing out; statement_timeout is set to 0. Here are the settings from my coordinator’s postgreql.conf:

    max_connections = 100
    shared_buffers = 32MB
    log_destination = 'stderr'
    logging_collector = on
    log_directory = 'pg_log'
    log_filename = 'postgresql-%a.log'
    log_truncate_on_rotation = on
    log_rotation_age = 1d
    log_rotation_size = 0
    log_line_prefix = '< %m >'
    log_timezone = 'US/Pacific'
    datestyle = 'iso, mdy'
    timezone = 'US/Pacific'
    lc_messages = 'en_US.UTF-8'
    lc_monetary = 'en_US.UTF-8'
    lc_numeric = 'en_US.UTF-8'
    lc_time = 'en_US.UTF-8'
    default_text_search_config = 'pg_catalog.english'
    pgxc_node_name = 'node1'
    port = 5432
    listen_addresses = '*'
    shared_buffers = 250MB
    work_mem = 128MB
    maintenance_work_mem = 128MB
    effective_cache_size = 8GB
    log_line_prefix = '%t %u %r %p %c '
    timezone = 'UTC'
    gtm_host = 'node1.example.com'

And from one of the data nodes (only the names differ on the others):

    max_connections = 100			
    shared_buffers = 32MB			
    log_destination = 'stderr'
    logging_collector = on	
    log_directory = 'pg_log'
    log_filename = 'postgresql-%a.log'
    log_truncate_on_rotation = on	
    log_rotation_age = 1d			
    log_rotation_size = 0			
    log_line_prefix = '< %m >'		
    log_timezone = 'US/Pacific'
    datestyle = 'iso, mdy'
    timezone = 'US/Pacific'
    lc_messages = 'en_US.UTF-8'		
    lc_monetary = 'en_US.UTF-8'		
    lc_numeric = 'en_US.UTF-8'		
    lc_time = 'en_US.UTF-8'			
    default_text_search_config = 'pg_catalog.english'
    pgxc_node_name = 'node1'	
    port = 15432
    listen_addresses = '*'
    shared_buffers = 750MB
    work_mem = 128MB
    maintenance_work_mem = 128MB
    effective_cache_size = 23GB
    log_line_prefix = '%t %u %r %p %c '
    timezone = 'UTC'
    gtm_host = 'node1.iovationnp.com'

Thoughts? What could be timing out?

Thanks,

David

Flat | Threaded

S	M	T	W	T	F	S
						1 (5)
2	3 (6)	4 (5)	5 (10)	6 (5)	7 (2)	8 (2)
9	10	11 (4)	12 (13)	13 (13)	14 (29)	15
16	17 (5)	18 (2)	19 (5)	20 (7)	21 (1)	22
23	24 (1)	25 (2)	26	27 (6)	28 (5)