You can subscribe to this list here.
2010 |
Jan
|
Feb
|
Mar
|
Apr
(10) |
May
(17) |
Jun
(3) |
Jul
|
Aug
|
Sep
(8) |
Oct
(18) |
Nov
(51) |
Dec
(74) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2011 |
Jan
(47) |
Feb
(44) |
Mar
(44) |
Apr
(102) |
May
(35) |
Jun
(25) |
Jul
(56) |
Aug
(69) |
Sep
(32) |
Oct
(37) |
Nov
(31) |
Dec
(16) |
2012 |
Jan
(34) |
Feb
(127) |
Mar
(218) |
Apr
(252) |
May
(80) |
Jun
(137) |
Jul
(205) |
Aug
(159) |
Sep
(35) |
Oct
(50) |
Nov
(82) |
Dec
(52) |
2013 |
Jan
(107) |
Feb
(159) |
Mar
(118) |
Apr
(163) |
May
(151) |
Jun
(89) |
Jul
(106) |
Aug
(177) |
Sep
(49) |
Oct
(63) |
Nov
(46) |
Dec
(7) |
2014 |
Jan
(65) |
Feb
(128) |
Mar
(40) |
Apr
(11) |
May
(4) |
Jun
(8) |
Jul
(16) |
Aug
(11) |
Sep
(4) |
Oct
(1) |
Nov
(5) |
Dec
(16) |
2015 |
Jan
(5) |
Feb
|
Mar
(2) |
Apr
(5) |
May
(4) |
Jun
(12) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(4) |
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
S | M | T | W | T | F | S |
---|---|---|---|---|---|---|
|
|
|
|
|
1
(1) |
2
|
3
|
4
|
5
(3) |
6
|
7
(9) |
8
(13) |
9
|
10
(2) |
11
(1) |
12
(4) |
13
(8) |
14
(7) |
15
(14) |
16
|
17
|
18
(16) |
19
(11) |
20
(7) |
21
(8) |
22
|
23
|
24
|
25
|
26
(9) |
27
(12) |
28
(8) |
29
(4) |
30
|
From: Pavan D. <pav...@gm...> - 2012-06-26 03:28:16
|
On Tue, Jun 26, 2012 at 8:22 AM, Peng-Chong LIU <li...@gm...> wrote: > Thanks for your instructions. It is exactly the same as Mr. Suzuki's. > > My xc cluster was configured on 3 PC servers/Gigabit Ethernet. I read some > xc documentations, which suggest the number of coordinators should be 1/3 > of that of datanodes for OLTP cases. However, I tried more possibilities. > > (1) 1 coordinator (server #3) and 1 datanode (server #3): 666 tps > (2) 1 coordinator (server #3) and 2 datanodes (server #1 & #3): 542 tps > (3) 1 coordinator (server #3) and 3 datanodes (server #1, #2, & #3): 554 > tps > (4) 3 coordinators (server #1, #2, & #3), and 3 datanodes (server #1, #2, > & #3): 426 tps > (5) Same as (4), but with "preferred" datanode settings: 20 tps!!! > > For (4) and (5), I cannot run pgxc_test_launcher.sh with multiple servers > in the same appServerList.data, since the multiple dbdriver processes write > the same log files. It seems the log files are corupted in this situation. > I have to run pgxc_test_launcher.sh with single dbdriver process on > different servers at the same time, then add the results together. > > Does other conditions (network speed among nodes, think_time, etc.) > matter? Sometimes, when I raised the workload, for example, think_time = 0, > the cluster crashed (either process gtm or postgres). > > Have you set up GTM proxy on each coordinator ? If not, I will recommend adding them to the configuration. That can make significant difference with large number of connections. Thanks, Pavan |
From: Peng-Chong L. <li...@gm...> - 2012-06-26 02:52:38
|
Thanks for your instructions. It is exactly the same as Mr. Suzuki's. My xc cluster was configured on 3 PC servers/Gigabit Ethernet. I read some xc documentations, which suggest the number of coordinators should be 1/3 of that of datanodes for OLTP cases. However, I tried more possibilities. (1) 1 coordinator (server #3) and 1 datanode (server #3): 666 tps (2) 1 coordinator (server #3) and 2 datanodes (server #1 & #3): 542 tps (3) 1 coordinator (server #3) and 3 datanodes (server #1, #2, & #3): 554 tps (4) 3 coordinators (server #1, #2, & #3), and 3 datanodes (server #1, #2, & #3): 426 tps (5) Same as (4), but with "preferred" datanode settings: 20 tps!!! For (4) and (5), I cannot run pgxc_test_launcher.sh with multiple servers in the same appServerList.data, since the multiple dbdriver processes write the same log files. It seems the log files are corupted in this situation. I have to run pgxc_test_launcher.sh with single dbdriver process on different servers at the same time, then add the results together. Does other conditions (network speed among nodes, think_time, etc.) matter? Sometimes, when I raised the workload, for example, think_time = 0, the cluster crashed (either process gtm or postgres). Since both of you metioned the preferred node thing, I will re-do test case (5) to make sure I did not do something wrong. 2012/6/26 Michael Paquier <mic...@gm...> > It looks you have been able to set up a cluster, that is already a good > step. > > What is the cluster structure you are using with those 3 servers? > Is it 1 Coordinator and 1 Datanode per server? > > We are able to get here performant results by grouping Datanode and > Coordinator on the same server, and then use a feature called the preferred > node to maximize the reads of replicated table to the local nodes, hence > heavily reduce the network traffic. > For example, assuming that in your case you have 3 Coordinators, 3 > Datanodes on those 3 servers, each server having 1 Coordinator and 1 > Datanode, you need to define the preferred Datanode of Coordinator 1 as > Datanode 1, preferred Datanode of Coordinator 2 as Datanode 2, same for > Coordinator 3/Datanode 3. > > You can define a preferred node by using CREATE NODE or ALTER NODE: > http://postgres-xc.sourceforge.net/docs/1_0/sql-createnode.html > http://postgres-xc.sourceforge.net/docs/1_0/sql-alternode.html > For example to create a Datanode as a preferred node on Coordinator, you > just need to do: > CREATE NODE certain_dn (PORT = $port, PREFERRED); > or ALTER NODE certain_dn (PREFERRED); > Once defined, all the reads of replicated tables will go to this node > (here certain_dn) in priority when an SQL reaches the Coordinator where the > preferred node is defined. > This really improves performance of DBT-1. > > Just by reading your email, I can say that there is no problem with DBT-1 > setting. > Have a try of the preferred node feature :) > > > On Tue, Jun 26, 2012 at 10:50 AM, Peng-Chong LIU <li...@gm...> wrote: > >> Hi there, >> >> I would like to reproduce DBT-1 performance test on xc cluster, so that I >> can understand its mechanism and limitations better. However, I cannot get >> the expected results. >> >> I used benchmark utility from xc git repository. Single-node xc cluster >> reached ca. 70% tps of PostgreSQL, which is reasonable. However, >> performance of 2-node and 3-node clusters dropped to only ca. 60% of >> PostgreSQL. >> >> With the kind help of Mr. Suziki in xc project team, I adjusted some >> cluster configuration. However, there were little improvement for the >> benchmark results. >> >> Do you have an internal dbt-1 test procedure or any clue to this problem >> (xc optimization/dbt-1 test parameters)? >> >> Thanks and regards, >> Liu >> >> Test Results: >> Pure PostgreSQL: node1 846 tps, node2 837 tps, node3 921 tps >> Single node xc: node3 666 tps >> 2-node xc: 542 tps >> 3-node xc (1 coordinator): 554 tps >> 3-node xc (3 coordinator): 426 tps >> >> Test Procedure: >> >> # download source >> git clone git://postgres-xc.git.sourceforge.net/gitroot/postgres-xc/dbt1 >> >> # build >> cd dbt1 >> make clean >> autoconf >> autoheader >> ./configure --with-postgresql=/opt/pgxc >> make >> make install >> >> # generate test data >> mkdir ~/test_data >> ./datagen/datagen -i 10000 -u 100 -p ~/test_data -T i >> ./datagen/datagen -i 10000 -u 100 -p ~/test_data -T c >> ./datagen/datagen -i 10000 -u 100 -p ~/test_data -T a >> >> # create database >> psql postgres -c "create database dbt1;" >> psql dbt1 -f "./scripts/pgsql/create_tables.sql" >> psql dbt1 -f "./scripts/pgsql/create_indexes.sql" >> psql dbt1 -f "./scripts/pgsql/create_sequence.sql" >> > >> # load test data >> psql dbt1 -c "COPY address FROM '/tmp/address.data' DELIMITER '>';" >> psql dbt1 -c "COPY author FROM '/tmp/author.data' DELIMITER '>';" >> psql dbt1 -c "COPY cc_xacts FROM '/tmp/cc_xacts.data' DELIMITER '>';" >> psql dbt1 -c "COPY country FROM '/tmp/country.data' DELIMITER '>';" >> psql dbt1 -c "COPY customer FROM '/tmp/customer.data' DELIMITER '>';" >> psql dbt1 -c "COPY item FROM '/tmp/item.data' DELIMITER '>';" >> psql dbt1 -c "COPY order_line FROM '/tmp/order_line.data' DELIMITER '>';" >> psql dbt1 -c "COPY orders FROM '/tmp/orders.data' DELIMITER '>';" >> psql dbt1 -c "COPY stock FROM '/tmp/stock.data' DELIMITER '>';" >> > >> # setup test program >> #cp appServerList.data.sample appServerList.data >> echo "127.0.0.1;5432;9992" > appServerList.data >> >> cp pgxc_stats_param.data.sample pgxc_stats_param.data >> sed -i "s/28800/288000/" pgxc_stats_param.data # customers >> sed -i "s/4000/300/" pgxc_stats_param.data # duration >> sed -i "s/7.2/0.1/" pgxc_stats_param.data # think time >> sed -i "s/500/100/" pgxc_stats_param.data # eu & eu/min >> >> # perform test >> export PGUSER=pgxc # export PGUSER=postgres >> export SID1=dbt1 >> chmod 755 pgxc_test_launcher.sh >> rm -f *.log >> ./pgxc_test_launcher.sh & >> >> # see results >> rm -f ~/BT ~/ips.csv >> ./tools/results --mixfile mix.log --outputdir ~/ >> cat ~/BT >> > > > -- > Michael Paquier > http://michael.otacoo.com > |
From: Michael P. <mic...@gm...> - 2012-06-26 02:27:12
|
It looks you have been able to set up a cluster, that is already a good step. What is the cluster structure you are using with those 3 servers? Is it 1 Coordinator and 1 Datanode per server? We are able to get here performant results by grouping Datanode and Coordinator on the same server, and then use a feature called the preferred node to maximize the reads of replicated table to the local nodes, hence heavily reduce the network traffic. For example, assuming that in your case you have 3 Coordinators, 3 Datanodes on those 3 servers, each server having 1 Coordinator and 1 Datanode, you need to define the preferred Datanode of Coordinator 1 as Datanode 1, preferred Datanode of Coordinator 2 as Datanode 2, same for Coordinator 3/Datanode 3. You can define a preferred node by using CREATE NODE or ALTER NODE: http://postgres-xc.sourceforge.net/docs/1_0/sql-createnode.html http://postgres-xc.sourceforge.net/docs/1_0/sql-alternode.html For example to create a Datanode as a preferred node on Coordinator, you just need to do: CREATE NODE certain_dn (PORT = $port, PREFERRED); or ALTER NODE certain_dn (PREFERRED); Once defined, all the reads of replicated tables will go to this node (here certain_dn) in priority when an SQL reaches the Coordinator where the preferred node is defined. This really improves performance of DBT-1. Just by reading your email, I can say that there is no problem with DBT-1 setting. Have a try of the preferred node feature :) On Tue, Jun 26, 2012 at 10:50 AM, Peng-Chong LIU <li...@gm...> wrote: > Hi there, > > I would like to reproduce DBT-1 performance test on xc cluster, so that I > can understand its mechanism and limitations better. However, I cannot get > the expected results. > > I used benchmark utility from xc git repository. Single-node xc cluster > reached ca. 70% tps of PostgreSQL, which is reasonable. However, > performance of 2-node and 3-node clusters dropped to only ca. 60% of > PostgreSQL. > > With the kind help of Mr. Suziki in xc project team, I adjusted some > cluster configuration. However, there were little improvement for the > benchmark results. > > Do you have an internal dbt-1 test procedure or any clue to this problem > (xc optimization/dbt-1 test parameters)? > > Thanks and regards, > Liu > > Test Results: > Pure PostgreSQL: node1 846 tps, node2 837 tps, node3 921 tps > Single node xc: node3 666 tps > 2-node xc: 542 tps > 3-node xc (1 coordinator): 554 tps > 3-node xc (3 coordinator): 426 tps > > Test Procedure: > > # download source > git clone git://postgres-xc.git.sourceforge.net/gitroot/postgres-xc/dbt1 > > # build > cd dbt1 > make clean > autoconf > autoheader > ./configure --with-postgresql=/opt/pgxc > make > make install > > # generate test data > mkdir ~/test_data > ./datagen/datagen -i 10000 -u 100 -p ~/test_data -T i > ./datagen/datagen -i 10000 -u 100 -p ~/test_data -T c > ./datagen/datagen -i 10000 -u 100 -p ~/test_data -T a > > # create database > psql postgres -c "create database dbt1;" > psql dbt1 -f "./scripts/pgsql/create_tables.sql" > psql dbt1 -f "./scripts/pgsql/create_indexes.sql" > psql dbt1 -f "./scripts/pgsql/create_sequence.sql" > > # load test data > psql dbt1 -c "COPY address FROM '/tmp/address.data' DELIMITER '>';" > psql dbt1 -c "COPY author FROM '/tmp/author.data' DELIMITER '>';" > psql dbt1 -c "COPY cc_xacts FROM '/tmp/cc_xacts.data' DELIMITER '>';" > psql dbt1 -c "COPY country FROM '/tmp/country.data' DELIMITER '>';" > psql dbt1 -c "COPY customer FROM '/tmp/customer.data' DELIMITER '>';" > psql dbt1 -c "COPY item FROM '/tmp/item.data' DELIMITER '>';" > psql dbt1 -c "COPY order_line FROM '/tmp/order_line.data' DELIMITER '>';" > psql dbt1 -c "COPY orders FROM '/tmp/orders.data' DELIMITER '>';" > psql dbt1 -c "COPY stock FROM '/tmp/stock.data' DELIMITER '>';" > > # setup test program > #cp appServerList.data.sample appServerList.data > echo "127.0.0.1;5432;9992" > appServerList.data > > cp pgxc_stats_param.data.sample pgxc_stats_param.data > sed -i "s/28800/288000/" pgxc_stats_param.data # customers > sed -i "s/4000/300/" pgxc_stats_param.data # duration > sed -i "s/7.2/0.1/" pgxc_stats_param.data # think time > sed -i "s/500/100/" pgxc_stats_param.data # eu & eu/min > > # perform test > export PGUSER=pgxc # export PGUSER=postgres > export SID1=dbt1 > chmod 755 pgxc_test_launcher.sh > rm -f *.log > ./pgxc_test_launcher.sh & > > # see results > rm -f ~/BT ~/ips.csv > ./tools/results --mixfile mix.log --outputdir ~/ > cat ~/BT > -- Michael Paquier http://michael.otacoo.com |
From: Peng-Chong L. <li...@gm...> - 2012-06-26 01:50:24
|
Hi there, I would like to reproduce DBT-1 performance test on xc cluster, so that I can understand its mechanism and limitations better. However, I cannot get the expected results. I used benchmark utility from xc git repository. Single-node xc cluster reached ca. 70% tps of PostgreSQL, which is reasonable. However, performance of 2-node and 3-node clusters dropped to only ca. 60% of PostgreSQL. With the kind help of Mr. Suziki in xc project team, I adjusted some cluster configuration. However, there were little improvement for the benchmark results. Do you have an internal dbt-1 test procedure or any clue to this problem (xc optimization/dbt-1 test parameters)? Thanks and regards, Liu Test Results: Pure PostgreSQL: node1 846 tps, node2 837 tps, node3 921 tps Single node xc: node3 666 tps 2-node xc: 542 tps 3-node xc (1 coordinator): 554 tps 3-node xc (3 coordinator): 426 tps Test Procedure: # download source git clone git://postgres-xc.git.sourceforge.net/gitroot/postgres-xc/dbt1 # build cd dbt1 make clean autoconf autoheader ./configure --with-postgresql=/opt/pgxc make make install # generate test data mkdir ~/test_data ./datagen/datagen -i 10000 -u 100 -p ~/test_data -T i ./datagen/datagen -i 10000 -u 100 -p ~/test_data -T c ./datagen/datagen -i 10000 -u 100 -p ~/test_data -T a # create database psql postgres -c "create database dbt1;" psql dbt1 -f "./scripts/pgsql/create_tables.sql" psql dbt1 -f "./scripts/pgsql/create_indexes.sql" psql dbt1 -f "./scripts/pgsql/create_sequence.sql" # load test data psql dbt1 -c "COPY address FROM '/tmp/address.data' DELIMITER '>';" psql dbt1 -c "COPY author FROM '/tmp/author.data' DELIMITER '>';" psql dbt1 -c "COPY cc_xacts FROM '/tmp/cc_xacts.data' DELIMITER '>';" psql dbt1 -c "COPY country FROM '/tmp/country.data' DELIMITER '>';" psql dbt1 -c "COPY customer FROM '/tmp/customer.data' DELIMITER '>';" psql dbt1 -c "COPY item FROM '/tmp/item.data' DELIMITER '>';" psql dbt1 -c "COPY order_line FROM '/tmp/order_line.data' DELIMITER '>';" psql dbt1 -c "COPY orders FROM '/tmp/orders.data' DELIMITER '>';" psql dbt1 -c "COPY stock FROM '/tmp/stock.data' DELIMITER '>';" # setup test program #cp appServerList.data.sample appServerList.data echo "127.0.0.1;5432;9992" > appServerList.data cp pgxc_stats_param.data.sample pgxc_stats_param.data sed -i "s/28800/288000/" pgxc_stats_param.data # customers sed -i "s/4000/300/" pgxc_stats_param.data # duration sed -i "s/7.2/0.1/" pgxc_stats_param.data # think time sed -i "s/500/100/" pgxc_stats_param.data # eu & eu/min # perform test export PGUSER=pgxc # export PGUSER=postgres export SID1=dbt1 chmod 755 pgxc_test_launcher.sh rm -f *.log ./pgxc_test_launcher.sh & # see results rm -f ~/BT ~/ips.csv ./tools/results --mixfile mix.log --outputdir ~/ cat ~/BT |
From: Koichi S. <koi...@gm...> - 2012-06-21 11:01:19
|
Yes. And we can share this with coordinator and data ode. --- Koichi Suzuki On 2012/06/21, at 19:55, Ashutosh Bapat <ash...@en...> wrote: > BTW, we will need to implement something like this for gtm_ctl for GTM and GTM-proxy, to be compatible in API. > > On Thu, Jun 21, 2012 at 4:24 PM, Ashutosh Bapat <ash...@en...> wrote: > Here is simpler idea to implement watchdog mechanism which does not need new mechanisms. > > -- from http://www.postgresql.org/docs/9.0/static/app-pg-ctl.html > Showing the Server Status > > Here is a sample status output from pg_ctl: > > $ pg_ctl status > pg_ctl: postmaster is running (pid: 13718) > Command line was: > /usr/local/pgsql/bin/postmaster '-D' '/usr/local/pgsql/data' '-p' '5433' '-B' '128' > > -- > Somebody who wants to check whether a server is running on not, needs to run pg_ctl status frequently on the same machine where the server is running. This tool can be built external to the XC core engine. > > > On Thu, Jun 21, 2012 at 4:13 PM, Koichi Suzuki <koi...@gm...> wrote: > Yes, but I think we should not recommend to do this. This is all by > chance. If we say psql can be used with datanode directly, this may > lead to many misuse of this. Also, psql will get warning from > datanode. > > Some people are not comfortable with using a means with warning as > normal case. Modifying psql to get GXID and snapshot will be a bit > of work, which is separate work. Watchdog mechanism can be shared > among gtm, gtm_proxy, coordinator and datanode. So I prefer > watchdog idea. > > At present, psql detection may take as long as one minute. Most > critical systems want to detect a fault in seconds (maybe less than > ten seconds). > ---------- > Koichi Suzuki > > > 2012/6/21 Ashutosh Bapat <ash...@en...>: > > Hi Suzuki-san, > > I have few questions, > > > > On Wed, Jun 20, 2012 at 12:12 PM, Koichi Suzuki <koi...@gm...> > > wrote: > >> > >> To monitor if each XC component is running, psql is not sufficient > >> because it does not check gtm/gtm_proxy/datanode. > > > > > > Datanodes can still be checked with the psql, am I missing something? Since > > GTM and GTM proxy is messaging based similar to PG, it should be possible to > > modify psql might be modified to kind of "ping" GTM. Have we thought about > > these possibilities? > > > >> > >> Also, psql > >> detection may take time. > > > > > > What's the scale of this delay, in seconds, minutes? > > > >> > >> As discussed in the cluster summit > >> (http://wiki.postgresql.org/wiki/PgCon2012CanadaClusterSummit), > >> watchdog time will be nice for this purpose. > >> > >> Here's a design of watchdog timer: > >> > >> 1. Have separate shared memory for each component, > >> 2. Postmaster and gtm/gtm_proxy server main loop increment each watchdog > >> timer, > >> 3. Timer will be detected by separate command to report any fault > >> > >> For this purpose, need some GUC and GTM/GTM-Proxy configuration > >> parameters to specify > >> a. If watchdog time is on > >> b. Timer increment interval (maybe in milliseconds) > >> > >> Shmid for each component will be kept in pg_control, gtm.control and > >> gtm_proxy.control files. > > > > > > Every time we add a new shared memory, we have to be keen to manage it > > correctly, during crash, shutdown (normal/forced). This becomes a coding > > maintenance burden and can be headache over time. Will it be possible to use > > the same shared memory as the PG with a small portion of it allocated for > > watchdog timer? > > > > As well, letting some external application attach to the database's shared > > memory (whether new or same) can be a security leak. We have to be very > > careful here. So, we should try not to use something so invasive as shared > > memory. Message queues, pipes, sockets might be better options. > > > >> > >> > >> API to attach shared memory for watchdog and read the timer value will > >> be provided too. > >> > >> Regards; > >> ---------- > >> Koichi Suzuki > >> > >> > >> ------------------------------------------------------------------------------ > >> Live Security Virtual Conference > >> Exclusive live event will cover all the ways today's security and > >> threat landscape has changed and how IT managers can respond. Discussions > >> will include endpoint security, mobile security and the latest in malware > >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > >> _______________________________________________ > >> Postgres-xc-developers mailing list > >> Pos...@li... > >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > > > > > > > > > -- > > Best Wishes, > > Ashutosh Bapat > > EntepriseDB Corporation > > The Enterprise Postgres Company > > > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > |
From: Ashutosh B. <ash...@en...> - 2012-06-21 10:55:58
|
BTW, we will need to implement something like this for gtm_ctl for GTM and GTM-proxy, to be compatible in API. On Thu, Jun 21, 2012 at 4:24 PM, Ashutosh Bapat < ash...@en...> wrote: > Here is simpler idea to implement watchdog mechanism which does not need > new mechanisms. > > -- from http://www.postgresql.org/docs/9.0/static/app-pg-ctl.html > Showing the Server Status > > Here is a sample status output from pg_ctl: > > $ pg_ctl status > pg_ctl: postmaster is running (pid: 13718) > Command line was: > /usr/local/pgsql/bin/postmaster '-D' '/usr/local/pgsql/data' '-p' '5433' > '-B' '128' > > -- > Somebody who wants to check whether a server is running on not, needs to > run pg_ctl status frequently on the same machine where the server is > running. This tool can be built external to the XC core engine. > > > On Thu, Jun 21, 2012 at 4:13 PM, Koichi Suzuki <koi...@gm...>wrote: > >> Yes, but I think we should not recommend to do this. This is all by >> chance. If we say psql can be used with datanode directly, this may >> lead to many misuse of this. Also, psql will get warning from >> datanode. >> >> Some people are not comfortable with using a means with warning as >> normal case. Modifying psql to get GXID and snapshot will be a bit >> of work, which is separate work. Watchdog mechanism can be shared >> among gtm, gtm_proxy, coordinator and datanode. So I prefer >> watchdog idea. >> >> At present, psql detection may take as long as one minute. Most >> critical systems want to detect a fault in seconds (maybe less than >> ten seconds). >> ---------- >> Koichi Suzuki >> >> >> 2012/6/21 Ashutosh Bapat <ash...@en...>: >> > Hi Suzuki-san, >> > I have few questions, >> > >> > On Wed, Jun 20, 2012 at 12:12 PM, Koichi Suzuki < >> koi...@gm...> >> > wrote: >> >> >> >> To monitor if each XC component is running, psql is not sufficient >> >> because it does not check gtm/gtm_proxy/datanode. >> > >> > >> > Datanodes can still be checked with the psql, am I missing something? >> Since >> > GTM and GTM proxy is messaging based similar to PG, it should be >> possible to >> > modify psql might be modified to kind of "ping" GTM. Have we thought >> about >> > these possibilities? >> > >> >> >> >> Also, psql >> >> detection may take time. >> > >> > >> > What's the scale of this delay, in seconds, minutes? >> > >> >> >> >> As discussed in the cluster summit >> >> (http://wiki.postgresql.org/wiki/PgCon2012CanadaClusterSummit), >> >> watchdog time will be nice for this purpose. >> >> >> >> Here's a design of watchdog timer: >> >> >> >> 1. Have separate shared memory for each component, >> >> 2. Postmaster and gtm/gtm_proxy server main loop increment each >> watchdog >> >> timer, >> >> 3. Timer will be detected by separate command to report any fault >> >> >> >> For this purpose, need some GUC and GTM/GTM-Proxy configuration >> >> parameters to specify >> >> a. If watchdog time is on >> >> b. Timer increment interval (maybe in milliseconds) >> >> >> >> Shmid for each component will be kept in pg_control, gtm.control and >> >> gtm_proxy.control files. >> > >> > >> > Every time we add a new shared memory, we have to be keen to manage it >> > correctly, during crash, shutdown (normal/forced). This becomes a coding >> > maintenance burden and can be headache over time. Will it be possible >> to use >> > the same shared memory as the PG with a small portion of it allocated >> for >> > watchdog timer? >> > >> > As well, letting some external application attach to the database's >> shared >> > memory (whether new or same) can be a security leak. We have to be very >> > careful here. So, we should try not to use something so invasive as >> shared >> > memory. Message queues, pipes, sockets might be better options. >> > >> >> >> >> >> >> API to attach shared memory for watchdog and read the timer value will >> >> be provided too. >> >> >> >> Regards; >> >> ---------- >> >> Koichi Suzuki >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> Live Security Virtual Conference >> >> Exclusive live event will cover all the ways today's security and >> >> threat landscape has changed and how IT managers can respond. >> Discussions >> >> will include endpoint security, mobile security and the latest in >> malware >> >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> >> _______________________________________________ >> >> Postgres-xc-developers mailing list >> >> Pos...@li... >> >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> > >> > >> > >> > >> > -- >> > Best Wishes, >> > Ashutosh Bapat >> > EntepriseDB Corporation >> > The Enterprise Postgres Company >> > >> > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Ashutosh B. <ash...@en...> - 2012-06-21 10:55:08
|
Here is simpler idea to implement watchdog mechanism which does not need new mechanisms. -- from http://www.postgresql.org/docs/9.0/static/app-pg-ctl.html Showing the Server Status Here is a sample status output from pg_ctl: $ pg_ctl status pg_ctl: postmaster is running (pid: 13718) Command line was: /usr/local/pgsql/bin/postmaster '-D' '/usr/local/pgsql/data' '-p' '5433' '-B' '128' -- Somebody who wants to check whether a server is running on not, needs to run pg_ctl status frequently on the same machine where the server is running. This tool can be built external to the XC core engine. On Thu, Jun 21, 2012 at 4:13 PM, Koichi Suzuki <koi...@gm...>wrote: > Yes, but I think we should not recommend to do this. This is all by > chance. If we say psql can be used with datanode directly, this may > lead to many misuse of this. Also, psql will get warning from > datanode. > > Some people are not comfortable with using a means with warning as > normal case. Modifying psql to get GXID and snapshot will be a bit > of work, which is separate work. Watchdog mechanism can be shared > among gtm, gtm_proxy, coordinator and datanode. So I prefer > watchdog idea. > > At present, psql detection may take as long as one minute. Most > critical systems want to detect a fault in seconds (maybe less than > ten seconds). > ---------- > Koichi Suzuki > > > 2012/6/21 Ashutosh Bapat <ash...@en...>: > > Hi Suzuki-san, > > I have few questions, > > > > On Wed, Jun 20, 2012 at 12:12 PM, Koichi Suzuki < > koi...@gm...> > > wrote: > >> > >> To monitor if each XC component is running, psql is not sufficient > >> because it does not check gtm/gtm_proxy/datanode. > > > > > > Datanodes can still be checked with the psql, am I missing something? > Since > > GTM and GTM proxy is messaging based similar to PG, it should be > possible to > > modify psql might be modified to kind of "ping" GTM. Have we thought > about > > these possibilities? > > > >> > >> Also, psql > >> detection may take time. > > > > > > What's the scale of this delay, in seconds, minutes? > > > >> > >> As discussed in the cluster summit > >> (http://wiki.postgresql.org/wiki/PgCon2012CanadaClusterSummit), > >> watchdog time will be nice for this purpose. > >> > >> Here's a design of watchdog timer: > >> > >> 1. Have separate shared memory for each component, > >> 2. Postmaster and gtm/gtm_proxy server main loop increment each watchdog > >> timer, > >> 3. Timer will be detected by separate command to report any fault > >> > >> For this purpose, need some GUC and GTM/GTM-Proxy configuration > >> parameters to specify > >> a. If watchdog time is on > >> b. Timer increment interval (maybe in milliseconds) > >> > >> Shmid for each component will be kept in pg_control, gtm.control and > >> gtm_proxy.control files. > > > > > > Every time we add a new shared memory, we have to be keen to manage it > > correctly, during crash, shutdown (normal/forced). This becomes a coding > > maintenance burden and can be headache over time. Will it be possible to > use > > the same shared memory as the PG with a small portion of it allocated for > > watchdog timer? > > > > As well, letting some external application attach to the database's > shared > > memory (whether new or same) can be a security leak. We have to be very > > careful here. So, we should try not to use something so invasive as > shared > > memory. Message queues, pipes, sockets might be better options. > > > >> > >> > >> API to attach shared memory for watchdog and read the timer value will > >> be provided too. > >> > >> Regards; > >> ---------- > >> Koichi Suzuki > >> > >> > >> > ------------------------------------------------------------------------------ > >> Live Security Virtual Conference > >> Exclusive live event will cover all the ways today's security and > >> threat landscape has changed and how IT managers can respond. > Discussions > >> will include endpoint security, mobile security and the latest in > malware > >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > >> _______________________________________________ > >> Postgres-xc-developers mailing list > >> Pos...@li... > >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > > > > > > > > > -- > > Best Wishes, > > Ashutosh Bapat > > EntepriseDB Corporation > > The Enterprise Postgres Company > > > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Koichi S. <koi...@gm...> - 2012-06-21 10:43:14
|
Yes, but I think we should not recommend to do this. This is all by chance. If we say psql can be used with datanode directly, this may lead to many misuse of this. Also, psql will get warning from datanode. Some people are not comfortable with using a means with warning as normal case. Modifying psql to get GXID and snapshot will be a bit of work, which is separate work. Watchdog mechanism can be shared among gtm, gtm_proxy, coordinator and datanode. So I prefer watchdog idea. At present, psql detection may take as long as one minute. Most critical systems want to detect a fault in seconds (maybe less than ten seconds). ---------- Koichi Suzuki 2012/6/21 Ashutosh Bapat <ash...@en...>: > Hi Suzuki-san, > I have few questions, > > On Wed, Jun 20, 2012 at 12:12 PM, Koichi Suzuki <koi...@gm...> > wrote: >> >> To monitor if each XC component is running, psql is not sufficient >> because it does not check gtm/gtm_proxy/datanode. > > > Datanodes can still be checked with the psql, am I missing something? Since > GTM and GTM proxy is messaging based similar to PG, it should be possible to > modify psql might be modified to kind of "ping" GTM. Have we thought about > these possibilities? > >> >> Also, psql >> detection may take time. > > > What's the scale of this delay, in seconds, minutes? > >> >> As discussed in the cluster summit >> (http://wiki.postgresql.org/wiki/PgCon2012CanadaClusterSummit), >> watchdog time will be nice for this purpose. >> >> Here's a design of watchdog timer: >> >> 1. Have separate shared memory for each component, >> 2. Postmaster and gtm/gtm_proxy server main loop increment each watchdog >> timer, >> 3. Timer will be detected by separate command to report any fault >> >> For this purpose, need some GUC and GTM/GTM-Proxy configuration >> parameters to specify >> a. If watchdog time is on >> b. Timer increment interval (maybe in milliseconds) >> >> Shmid for each component will be kept in pg_control, gtm.control and >> gtm_proxy.control files. > > > Every time we add a new shared memory, we have to be keen to manage it > correctly, during crash, shutdown (normal/forced). This becomes a coding > maintenance burden and can be headache over time. Will it be possible to use > the same shared memory as the PG with a small portion of it allocated for > watchdog timer? > > As well, letting some external application attach to the database's shared > memory (whether new or same) can be a security leak. We have to be very > careful here. So, we should try not to use something so invasive as shared > memory. Message queues, pipes, sockets might be better options. > >> >> >> API to attach shared memory for watchdog and read the timer value will >> be provided too. >> >> Regards; >> ---------- >> Koichi Suzuki >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Postgres-xc-developers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > |
From: Ashutosh B. <ash...@en...> - 2012-06-21 09:46:58
|
Hi Suzuki-san, I have few questions, On Wed, Jun 20, 2012 at 12:12 PM, Koichi Suzuki <koi...@gm...>wrote: > To monitor if each XC component is running, psql is not sufficient > because it does not check gtm/gtm_proxy/datanode. Datanodes can still be checked with the psql, am I missing something? Since GTM and GTM proxy is messaging based similar to PG, it should be possible to modify psql might be modified to kind of "ping" GTM. Have we thought about these possibilities? > Also, psql > detection may take time. What's the scale of this delay, in seconds, minutes? > As discussed in the cluster summit > (http://wiki.postgresql.org/wiki/PgCon2012CanadaClusterSummit), > watchdog time will be nice for this purpose. > > Here's a design of watchdog timer: > > 1. Have separate shared memory for each component, > 2. Postmaster and gtm/gtm_proxy server main loop increment each watchdog > timer, > 3. Timer will be detected by separate command to report any fault > > For this purpose, need some GUC and GTM/GTM-Proxy configuration > parameters to specify > a. If watchdog time is on > b. Timer increment interval (maybe in milliseconds) > > Shmid for each component will be kept in pg_control, gtm.control and > gtm_proxy.control files. > Every time we add a new shared memory, we have to be keen to manage it correctly, during crash, shutdown (normal/forced). This becomes a coding maintenance burden and can be headache over time. Will it be possible to use the same shared memory as the PG with a small portion of it allocated for watchdog timer? As well, letting some external application attach to the database's shared memory (whether new or same) can be a security leak. We have to be very careful here. So, we should try not to use something so invasive as shared memory. Message queues, pipes, sockets might be better options. > > API to attach shared memory for watchdog and read the timer value will > be provided too. > > Regards; > ---------- > Koichi Suzuki > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Michael P. <mic...@gm...> - 2012-06-21 09:19:46
|
OK, so I'll check it a bit and then commit it. On Thu, Jun 21, 2012 at 6:16 PM, Koichi Suzuki <koi...@gm...>wrote: > Yes, I did for both 32bit and 64bit environment. The cause of the > error was very straightforward and the patch must not have any side > effect. > > Regards; > ---------- > Koichi Suzuki > > > 2012/6/21 Michael Paquier <mic...@gm...>: > > Just a question here. > > Have you tested this patch in both 64b and 32b environments? > > > > On Wed, Jun 20, 2012 at 3:21 PM, Koichi Suzuki < > koi...@gm...> > > wrote: > >> > >> Fixed the bug. > >> > >> The cause was incompatible definition of GTM_ThreadInfo and > >> GTMProxy_ThreadInfo. The third entry in GTM_ThreadInfo, > >> is_main_thread, is missing in GTMProxy_ThreadInfo, which caused > >> different offset of the following members in 32bit environment. In > >> 64-bit environment, space for the missing member is padded. > >> > >> Fixing patch is enclosed. > >> > >> Regards; > >> ---------- > >> Koichi Suzuki > >> > >> > >> 2012/6/20 Koichi Suzuki <koi...@gm...>: > >> > Somehow, this thread run in private, which is not good at all. > >> > > >> > Gtm_proxy crash in 32bit environment has been discussed between me and > >> > Plexo Rama. Now I added this to the bug tracker with ID 3536469. > >> > > >> > So far, in 32bit environment, I found that thr_thread_context and > >> > thr_current_context are not set properly. They're set to NULL. On > >> > the other hand, in 64bit, all the thread information members are set > >> > to proper values. > >> > > >> > Now finding what made this difference. > >> > > >> > Regards; > >> > ---------- > >> > Koichi Suzuki > >> > > >> > > >> > 2012/6/19 Koichi Suzuki <koi...@gm...> > >> >> > >> >> Hi, > >> >> > >> >> The problem is MemoryContextAllocZero receives NULL MemoryContext, > >> >> which > >> >> shall be CurrentMemoryContext. palloc0() does this and > >> >> CurrentMemoryContext should have been set in BaseInit(). > >> >> > >> >> It's very straightforward and there're no struct involved and I need > to > >> >> run it in the 32bit environment to see what is going on really. > >> >> > >> >> Anyway, this information saved much of my time. > >> >> > >> >> Thank you very much; > >> >> ---------- > >> >> Koichi Suzuki > >> >> > >> >> > >> >> > >> >> 2012/6/18 plexo rama <ple...@gm...> > >> >>> > >> >>> Suzuki-san, > >> >>> > >> >>> this is the output of the backtrace command: > >> >>> > >> >>> #0 0x08052bcc in MemoryContextAllocZero (context=0x0, size=128) at > >> >>> mcxt.c:590 > >> >>> #1 0x08051da1 in GTMProxy_ThreadAdd (thrinfo=0x807c538) at > >> >>> proxy_thread.c:72 > >> >>> #2 0x0804ac02 in BaseInit () at proxy_main.c:279 > >> >>> #3 0x0804bd3f in main (argc=3, argv=0xbffff564) at proxy_main.c:836 > >> >>> > >> >>> Please note that the line 590 of mcxt.c maps to line 585 in the > >> >>> original > >> >>> mcxt.c (wich is distributed in the v1.0.0 archive). > >> >>> You must have an instance of GTM running before starting gtm_proxy, > >> >>> otherwise the segfault won't occur. > >> >>> > >> >>> > >> >>> Plexo > >> >>> > >> >>> > >> >>> > >> >>> 2012/6/18 Koichi Suzuki <koi...@gm...> > >> >>>> > >> >>>> I'm trying to fix this though it is only for 32bit. > >> >>>> > >> >>>> Because it may take a bit for me to build 32bit environment, If > you > >> >>>> have core file of GTM crash, it's very helpful if you send be back > >> >>>> trace of > >> >>>> the core (bt command) by gdb. I hope back trace will pinpoint the > >> >>>> cause of > >> >>>> the bug. > >> >>>> > >> >>>> Best Regards; > >> >>>> ---------- > >> >>>> Koichi Suzuki > >> >>>> > >> >>>> > >> >>>> > >> >>>> 2012/6/17 plexo rama <ple...@gm...> > >> >>>>> > >> >>>>> Suzuki-san, > >> >>>>> > >> >>>>> it seems the problem only occurs on 32bit systems. > >> >>>>> > >> >>>>> I've compiled the source using CLANG, gcc-4.4 & gcc-4.6 on Ubuntu > >> >>>>> 10.04.2 LTS 32bit system running in a virtual machine hosted on > >> >>>>> linode.com. > >> >>>>> > >> >>>>> The result was the same each try. I've tried it with the latest > >> >>>>> 1.0.0-beta package as well as v0.9.7. > >> >>>>> > >> >>>>> When starting gtm_proxy on a 32bit system in gdb I always receive > >> >>>>> > >> >>>>> Program received signal SIGSEGV, Segmentation fault. > >> >>>>> 0x080627ed in MemoryContextAllocZero (context=0x0, size=128) at > >> >>>>> mcxt.c:585 > >> >>>>> 585 ret = (*context->methods->alloc) (context, size); > >> >>>>> > >> >>>>> > >> >>>>> However, I've had success compiling & running the code / gtm_proxy > >> >>>>> on a > >> >>>>> 64bit system running Ubuntu 10.04.2 LTS 64bit. > >> >>>>> > >> >>>>> I hope that helps? > >> >>>>> > >> >>>>> I'm not sure whether it make sense to look further in to that > issue > >> >>>>> as > >> >>>>> 32bit environments don't really make sense as a production system, > >> >>>>> except > >> >>>>> for testing. > >> >>>>> Although, it would be good to know, what the issue really is. > >> >>>>> > >> >>>>> > >> >>>>> Plexo > >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> 2012/6/16 Koichi Suzuki <koi...@gm...> > >> >>>>>> > >> >>>>>> Plexo; > >> >>>>>> > >> >>>>>> Thanks a lot for the report. I will look into it when back to > >> >>>>>> Japan > >> >>>>>> (now I'm in Beijing). > >> >>>>>> > >> >>>>>> To reproduce the problems, could you let me know your > configuration > >> >>>>>> (port and hosts of each component, including GTM, GTM_Proxy, > >> >>>>>> Coordinator and > >> >>>>>> datanodes) and how to reproduce the problem? > >> >>>>>> > >> >>>>>> Also, could you send me bt of the core file? > >> >>>>>> > >> >>>>>> Best Regards; > >> >>>>>> ---------- > >> >>>>>> Koichi Suzuki > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> 2012/6/16 plexo rama <ple...@gm...> > >> >>>>>>> > >> >>>>>>> Suzuki-san, > >> >>>>>>> > >> >>>>>>> I've received another segfault in gtm_proxy > >> >>>>>>> > >> >>>>>>> Program received signal SIGSEGV, Segmentation fault. > >> >>>>>>> [Switching to Thread 0xb7e56b70 (LWP 10493)] > >> >>>>>>> 0x0806961f in errfinish (dummy=0) at elog.c:320 > >> >>>>>>> 320 EmitErrorReport(MyPort); > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> it looks like elog.c also requires a special handling, > >> >>>>>>> as MyPort-macro uses GetMyThreadInfo and thus refers > >> >>>>>>> to GTM_ThreadInfo instead of GTMProxy_ThreadInfo > >> >>>>>>> > >> >>>>>>> 2012/6/15 Koichi Suzuki <koi...@gm...> > >> >>>>>>>> > >> >>>>>>>> Yes, I will do it. > >> >>>>>>>> > >> >>>>>>>> --- > >> >>>>>>>> Koichi Suzuki > >> >>>>>>>> > >> >>>>>>>> On 2012/06/15, at 8:07, Michael Paquier > >> >>>>>>>> <mic...@gm...> > >> >>>>>>>> wrote: > >> >>>>>>>> > >> >>>>>>>> > >> >>>>>>>> > >> >>>>>>>> On Fri, Jun 15, 2012 at 7:37 AM, Koichi Suzuki > >> >>>>>>>> <koi...@gm...> wrote: > >> >>>>>>>>> > >> >>>>>>>>> Thanks for the pointing out. Another requirement was to make > >> >>>>>>>>> mcxt.o > >> >>>>>>>>> shared among gtm and gtm_proxy. I will look check if this > >> >>>>>>>>> requirement makes sense now (it did, when very first > pgxc_clean > >> >>>>>>>>> was > >> >>>>>>>>> implemented, which was rewritten at V 1.0). > >> >>>>>>>>> > >> >>>>>>>>> I'm afraid the cause of NULL pointer is different. > >> >>>>>>>> > >> >>>>>>>> Suzuki-san, > >> >>>>>>>> > >> >>>>>>>> Could you sort that out with plexo and review any patch he > sends? > >> >>>>>>>> You know this area of the code pretty well so I believe you are > >> >>>>>>>> well-suited here. > >> >>>>>>>> > >> >>>>>>>> Regards, > >> >>>>>>>> -- > >> >>>>>>>> Michael Paquier > >> >>>>>>>> http://michael.otacoo.com > >> >>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > >> > >> > ------------------------------------------------------------------------------ > >> Live Security Virtual Conference > >> Exclusive live event will cover all the ways today's security and > >> threat landscape has changed and how IT managers can respond. > Discussions > >> will include endpoint security, mobile security and the latest in > malware > >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > >> _______________________________________________ > >> Postgres-xc-developers mailing list > >> Pos...@li... > >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > >> > > > > > > > > -- > > Michael Paquier > > http://michael.otacoo.com > -- Michael Paquier http://michael.otacoo.com |
From: Koichi S. <koi...@gm...> - 2012-06-21 09:16:52
|
Yes, I did for both 32bit and 64bit environment. The cause of the error was very straightforward and the patch must not have any side effect. Regards; ---------- Koichi Suzuki 2012/6/21 Michael Paquier <mic...@gm...>: > Just a question here. > Have you tested this patch in both 64b and 32b environments? > > On Wed, Jun 20, 2012 at 3:21 PM, Koichi Suzuki <koi...@gm...> > wrote: >> >> Fixed the bug. >> >> The cause was incompatible definition of GTM_ThreadInfo and >> GTMProxy_ThreadInfo. The third entry in GTM_ThreadInfo, >> is_main_thread, is missing in GTMProxy_ThreadInfo, which caused >> different offset of the following members in 32bit environment. In >> 64-bit environment, space for the missing member is padded. >> >> Fixing patch is enclosed. >> >> Regards; >> ---------- >> Koichi Suzuki >> >> >> 2012/6/20 Koichi Suzuki <koi...@gm...>: >> > Somehow, this thread run in private, which is not good at all. >> > >> > Gtm_proxy crash in 32bit environment has been discussed between me and >> > Plexo Rama. Now I added this to the bug tracker with ID 3536469. >> > >> > So far, in 32bit environment, I found that thr_thread_context and >> > thr_current_context are not set properly. They're set to NULL. On >> > the other hand, in 64bit, all the thread information members are set >> > to proper values. >> > >> > Now finding what made this difference. >> > >> > Regards; >> > ---------- >> > Koichi Suzuki >> > >> > >> > 2012/6/19 Koichi Suzuki <koi...@gm...> >> >> >> >> Hi, >> >> >> >> The problem is MemoryContextAllocZero receives NULL MemoryContext, >> >> which >> >> shall be CurrentMemoryContext. palloc0() does this and >> >> CurrentMemoryContext should have been set in BaseInit(). >> >> >> >> It's very straightforward and there're no struct involved and I need to >> >> run it in the 32bit environment to see what is going on really. >> >> >> >> Anyway, this information saved much of my time. >> >> >> >> Thank you very much; >> >> ---------- >> >> Koichi Suzuki >> >> >> >> >> >> >> >> 2012/6/18 plexo rama <ple...@gm...> >> >>> >> >>> Suzuki-san, >> >>> >> >>> this is the output of the backtrace command: >> >>> >> >>> #0 0x08052bcc in MemoryContextAllocZero (context=0x0, size=128) at >> >>> mcxt.c:590 >> >>> #1 0x08051da1 in GTMProxy_ThreadAdd (thrinfo=0x807c538) at >> >>> proxy_thread.c:72 >> >>> #2 0x0804ac02 in BaseInit () at proxy_main.c:279 >> >>> #3 0x0804bd3f in main (argc=3, argv=0xbffff564) at proxy_main.c:836 >> >>> >> >>> Please note that the line 590 of mcxt.c maps to line 585 in the >> >>> original >> >>> mcxt.c (wich is distributed in the v1.0.0 archive). >> >>> You must have an instance of GTM running before starting gtm_proxy, >> >>> otherwise the segfault won't occur. >> >>> >> >>> >> >>> Plexo >> >>> >> >>> >> >>> >> >>> 2012/6/18 Koichi Suzuki <koi...@gm...> >> >>>> >> >>>> I'm trying to fix this though it is only for 32bit. >> >>>> >> >>>> Because it may take a bit for me to build 32bit environment, If you >> >>>> have core file of GTM crash, it's very helpful if you send be back >> >>>> trace of >> >>>> the core (bt command) by gdb. I hope back trace will pinpoint the >> >>>> cause of >> >>>> the bug. >> >>>> >> >>>> Best Regards; >> >>>> ---------- >> >>>> Koichi Suzuki >> >>>> >> >>>> >> >>>> >> >>>> 2012/6/17 plexo rama <ple...@gm...> >> >>>>> >> >>>>> Suzuki-san, >> >>>>> >> >>>>> it seems the problem only occurs on 32bit systems. >> >>>>> >> >>>>> I've compiled the source using CLANG, gcc-4.4 & gcc-4.6 on Ubuntu >> >>>>> 10.04.2 LTS 32bit system running in a virtual machine hosted on >> >>>>> linode.com. >> >>>>> >> >>>>> The result was the same each try. I've tried it with the latest >> >>>>> 1.0.0-beta package as well as v0.9.7. >> >>>>> >> >>>>> When starting gtm_proxy on a 32bit system in gdb I always receive >> >>>>> >> >>>>> Program received signal SIGSEGV, Segmentation fault. >> >>>>> 0x080627ed in MemoryContextAllocZero (context=0x0, size=128) at >> >>>>> mcxt.c:585 >> >>>>> 585 ret = (*context->methods->alloc) (context, size); >> >>>>> >> >>>>> >> >>>>> However, I've had success compiling & running the code / gtm_proxy >> >>>>> on a >> >>>>> 64bit system running Ubuntu 10.04.2 LTS 64bit. >> >>>>> >> >>>>> I hope that helps? >> >>>>> >> >>>>> I'm not sure whether it make sense to look further in to that issue >> >>>>> as >> >>>>> 32bit environments don't really make sense as a production system, >> >>>>> except >> >>>>> for testing. >> >>>>> Although, it would be good to know, what the issue really is. >> >>>>> >> >>>>> >> >>>>> Plexo >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> 2012/6/16 Koichi Suzuki <koi...@gm...> >> >>>>>> >> >>>>>> Plexo; >> >>>>>> >> >>>>>> Thanks a lot for the report. I will look into it when back to >> >>>>>> Japan >> >>>>>> (now I'm in Beijing). >> >>>>>> >> >>>>>> To reproduce the problems, could you let me know your configuration >> >>>>>> (port and hosts of each component, including GTM, GTM_Proxy, >> >>>>>> Coordinator and >> >>>>>> datanodes) and how to reproduce the problem? >> >>>>>> >> >>>>>> Also, could you send me bt of the core file? >> >>>>>> >> >>>>>> Best Regards; >> >>>>>> ---------- >> >>>>>> Koichi Suzuki >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> 2012/6/16 plexo rama <ple...@gm...> >> >>>>>>> >> >>>>>>> Suzuki-san, >> >>>>>>> >> >>>>>>> I've received another segfault in gtm_proxy >> >>>>>>> >> >>>>>>> Program received signal SIGSEGV, Segmentation fault. >> >>>>>>> [Switching to Thread 0xb7e56b70 (LWP 10493)] >> >>>>>>> 0x0806961f in errfinish (dummy=0) at elog.c:320 >> >>>>>>> 320 EmitErrorReport(MyPort); >> >>>>>>> >> >>>>>>> >> >>>>>>> it looks like elog.c also requires a special handling, >> >>>>>>> as MyPort-macro uses GetMyThreadInfo and thus refers >> >>>>>>> to GTM_ThreadInfo instead of GTMProxy_ThreadInfo >> >>>>>>> >> >>>>>>> 2012/6/15 Koichi Suzuki <koi...@gm...> >> >>>>>>>> >> >>>>>>>> Yes, I will do it. >> >>>>>>>> >> >>>>>>>> --- >> >>>>>>>> Koichi Suzuki >> >>>>>>>> >> >>>>>>>> On 2012/06/15, at 8:07, Michael Paquier >> >>>>>>>> <mic...@gm...> >> >>>>>>>> wrote: >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> On Fri, Jun 15, 2012 at 7:37 AM, Koichi Suzuki >> >>>>>>>> <koi...@gm...> wrote: >> >>>>>>>>> >> >>>>>>>>> Thanks for the pointing out. Another requirement was to make >> >>>>>>>>> mcxt.o >> >>>>>>>>> shared among gtm and gtm_proxy. I will look check if this >> >>>>>>>>> requirement makes sense now (it did, when very first pgxc_clean >> >>>>>>>>> was >> >>>>>>>>> implemented, which was rewritten at V 1.0). >> >>>>>>>>> >> >>>>>>>>> I'm afraid the cause of NULL pointer is different. >> >>>>>>>> >> >>>>>>>> Suzuki-san, >> >>>>>>>> >> >>>>>>>> Could you sort that out with plexo and review any patch he sends? >> >>>>>>>> You know this area of the code pretty well so I believe you are >> >>>>>>>> well-suited here. >> >>>>>>>> >> >>>>>>>> Regards, >> >>>>>>>> -- >> >>>>>>>> Michael Paquier >> >>>>>>>> http://michael.otacoo.com >> >>>>>>> >> >>>>>>> >> >>>>>> >> >>>>> >> >>>> >> >>> >> >> >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Postgres-xc-developers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> > > > > -- > Michael Paquier > http://michael.otacoo.com |
From: Michael P. <mic...@gm...> - 2012-06-21 08:15:08
|
Just a question here. Have you tested this patch in both 64b and 32b environments? On Wed, Jun 20, 2012 at 3:21 PM, Koichi Suzuki <koi...@gm...>wrote: > Fixed the bug. > > The cause was incompatible definition of GTM_ThreadInfo and > GTMProxy_ThreadInfo. The third entry in GTM_ThreadInfo, > is_main_thread, is missing in GTMProxy_ThreadInfo, which caused > different offset of the following members in 32bit environment. In > 64-bit environment, space for the missing member is padded. > > Fixing patch is enclosed. > > Regards; > ---------- > Koichi Suzuki > > > 2012/6/20 Koichi Suzuki <koi...@gm...>: > > Somehow, this thread run in private, which is not good at all. > > > > Gtm_proxy crash in 32bit environment has been discussed between me and > > Plexo Rama. Now I added this to the bug tracker with ID 3536469. > > > > So far, in 32bit environment, I found that thr_thread_context and > > thr_current_context are not set properly. They're set to NULL. On > > the other hand, in 64bit, all the thread information members are set > > to proper values. > > > > Now finding what made this difference. > > > > Regards; > > ---------- > > Koichi Suzuki > > > > > > 2012/6/19 Koichi Suzuki <koi...@gm...> > >> > >> Hi, > >> > >> The problem is MemoryContextAllocZero receives NULL MemoryContext, which > >> shall be CurrentMemoryContext. palloc0() does this and > >> CurrentMemoryContext should have been set in BaseInit(). > >> > >> It's very straightforward and there're no struct involved and I need to > >> run it in the 32bit environment to see what is going on really. > >> > >> Anyway, this information saved much of my time. > >> > >> Thank you very much; > >> ---------- > >> Koichi Suzuki > >> > >> > >> > >> 2012/6/18 plexo rama <ple...@gm...> > >>> > >>> Suzuki-san, > >>> > >>> this is the output of the backtrace command: > >>> > >>> #0 0x08052bcc in MemoryContextAllocZero (context=0x0, size=128) at > >>> mcxt.c:590 > >>> #1 0x08051da1 in GTMProxy_ThreadAdd (thrinfo=0x807c538) at > >>> proxy_thread.c:72 > >>> #2 0x0804ac02 in BaseInit () at proxy_main.c:279 > >>> #3 0x0804bd3f in main (argc=3, argv=0xbffff564) at proxy_main.c:836 > >>> > >>> Please note that the line 590 of mcxt.c maps to line 585 in the > original > >>> mcxt.c (wich is distributed in the v1.0.0 archive). > >>> You must have an instance of GTM running before starting gtm_proxy, > >>> otherwise the segfault won't occur. > >>> > >>> > >>> Plexo > >>> > >>> > >>> > >>> 2012/6/18 Koichi Suzuki <koi...@gm...> > >>>> > >>>> I'm trying to fix this though it is only for 32bit. > >>>> > >>>> Because it may take a bit for me to build 32bit environment, If you > >>>> have core file of GTM crash, it's very helpful if you send be back > trace of > >>>> the core (bt command) by gdb. I hope back trace will pinpoint the > cause of > >>>> the bug. > >>>> > >>>> Best Regards; > >>>> ---------- > >>>> Koichi Suzuki > >>>> > >>>> > >>>> > >>>> 2012/6/17 plexo rama <ple...@gm...> > >>>>> > >>>>> Suzuki-san, > >>>>> > >>>>> it seems the problem only occurs on 32bit systems. > >>>>> > >>>>> I've compiled the source using CLANG, gcc-4.4 & gcc-4.6 on Ubuntu > >>>>> 10.04.2 LTS 32bit system running in a virtual machine hosted on > linode.com. > >>>>> > >>>>> The result was the same each try. I've tried it with the latest > >>>>> 1.0.0-beta package as well as v0.9.7. > >>>>> > >>>>> When starting gtm_proxy on a 32bit system in gdb I always receive > >>>>> > >>>>> Program received signal SIGSEGV, Segmentation fault. > >>>>> 0x080627ed in MemoryContextAllocZero (context=0x0, size=128) at > >>>>> mcxt.c:585 > >>>>> 585 ret = (*context->methods->alloc) (context, size); > >>>>> > >>>>> > >>>>> However, I've had success compiling & running the code / gtm_proxy > on a > >>>>> 64bit system running Ubuntu 10.04.2 LTS 64bit. > >>>>> > >>>>> I hope that helps? > >>>>> > >>>>> I'm not sure whether it make sense to look further in to that issue > as > >>>>> 32bit environments don't really make sense as a production system, > except > >>>>> for testing. > >>>>> Although, it would be good to know, what the issue really is. > >>>>> > >>>>> > >>>>> Plexo > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> 2012/6/16 Koichi Suzuki <koi...@gm...> > >>>>>> > >>>>>> Plexo; > >>>>>> > >>>>>> Thanks a lot for the report. I will look into it when back to > Japan > >>>>>> (now I'm in Beijing). > >>>>>> > >>>>>> To reproduce the problems, could you let me know your configuration > >>>>>> (port and hosts of each component, including GTM, GTM_Proxy, > Coordinator and > >>>>>> datanodes) and how to reproduce the problem? > >>>>>> > >>>>>> Also, could you send me bt of the core file? > >>>>>> > >>>>>> Best Regards; > >>>>>> ---------- > >>>>>> Koichi Suzuki > >>>>>> > >>>>>> > >>>>>> > >>>>>> 2012/6/16 plexo rama <ple...@gm...> > >>>>>>> > >>>>>>> Suzuki-san, > >>>>>>> > >>>>>>> I've received another segfault in gtm_proxy > >>>>>>> > >>>>>>> Program received signal SIGSEGV, Segmentation fault. > >>>>>>> [Switching to Thread 0xb7e56b70 (LWP 10493)] > >>>>>>> 0x0806961f in errfinish (dummy=0) at elog.c:320 > >>>>>>> 320 EmitErrorReport(MyPort); > >>>>>>> > >>>>>>> > >>>>>>> it looks like elog.c also requires a special handling, > >>>>>>> as MyPort-macro uses GetMyThreadInfo and thus refers > >>>>>>> to GTM_ThreadInfo instead of GTMProxy_ThreadInfo > >>>>>>> > >>>>>>> 2012/6/15 Koichi Suzuki <koi...@gm...> > >>>>>>>> > >>>>>>>> Yes, I will do it. > >>>>>>>> > >>>>>>>> --- > >>>>>>>> Koichi Suzuki > >>>>>>>> > >>>>>>>> On 2012/06/15, at 8:07, Michael Paquier < > mic...@gm...> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> On Fri, Jun 15, 2012 at 7:37 AM, Koichi Suzuki > >>>>>>>> <koi...@gm...> wrote: > >>>>>>>>> > >>>>>>>>> Thanks for the pointing out. Another requirement was to make > >>>>>>>>> mcxt.o > >>>>>>>>> shared among gtm and gtm_proxy. I will look check if this > >>>>>>>>> requirement makes sense now (it did, when very first pgxc_clean > was > >>>>>>>>> implemented, which was rewritten at V 1.0). > >>>>>>>>> > >>>>>>>>> I'm afraid the cause of NULL pointer is different. > >>>>>>>> > >>>>>>>> Suzuki-san, > >>>>>>>> > >>>>>>>> Could you sort that out with plexo and review any patch he sends? > >>>>>>>> You know this area of the code pretty well so I believe you are > >>>>>>>> well-suited here. > >>>>>>>> > >>>>>>>> Regards, > >>>>>>>> -- > >>>>>>>> Michael Paquier > >>>>>>>> http://michael.otacoo.com > >>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > -- Michael Paquier http://michael.otacoo.com |
From: Koichi S. <koi...@gm...> - 2012-06-20 06:42:20
|
To monitor if each XC component is running, psql is not sufficient because it does not check gtm/gtm_proxy/datanode. Also, psql detection may take time. As discussed in the cluster summit (http://wiki.postgresql.org/wiki/PgCon2012CanadaClusterSummit), watchdog time will be nice for this purpose. Here's a design of watchdog timer: 1. Have separate shared memory for each component, 2. Postmaster and gtm/gtm_proxy server main loop increment each watchdog timer, 3. Timer will be detected by separate command to report any fault For this purpose, need some GUC and GTM/GTM-Proxy configuration parameters to specify a. If watchdog time is on b. Timer increment interval (maybe in milliseconds) Shmid for each component will be kept in pg_control, gtm.control and gtm_proxy.control files. API to attach shared memory for watchdog and read the timer value will be provided too. Regards; ---------- Koichi Suzuki |
From: Koichi S. <koi...@gm...> - 2012-06-20 06:21:32
|
Fixed the bug. The cause was incompatible definition of GTM_ThreadInfo and GTMProxy_ThreadInfo. The third entry in GTM_ThreadInfo, is_main_thread, is missing in GTMProxy_ThreadInfo, which caused different offset of the following members in 32bit environment. In 64-bit environment, space for the missing member is padded. Fixing patch is enclosed. Regards; ---------- Koichi Suzuki 2012/6/20 Koichi Suzuki <koi...@gm...>: > Somehow, this thread run in private, which is not good at all. > > Gtm_proxy crash in 32bit environment has been discussed between me and > Plexo Rama. Now I added this to the bug tracker with ID 3536469. > > So far, in 32bit environment, I found that thr_thread_context and > thr_current_context are not set properly. They're set to NULL. On > the other hand, in 64bit, all the thread information members are set > to proper values. > > Now finding what made this difference. > > Regards; > ---------- > Koichi Suzuki > > > 2012/6/19 Koichi Suzuki <koi...@gm...> >> >> Hi, >> >> The problem is MemoryContextAllocZero receives NULL MemoryContext, which >> shall be CurrentMemoryContext. palloc0() does this and >> CurrentMemoryContext should have been set in BaseInit(). >> >> It's very straightforward and there're no struct involved and I need to >> run it in the 32bit environment to see what is going on really. >> >> Anyway, this information saved much of my time. >> >> Thank you very much; >> ---------- >> Koichi Suzuki >> >> >> >> 2012/6/18 plexo rama <ple...@gm...> >>> >>> Suzuki-san, >>> >>> this is the output of the backtrace command: >>> >>> #0 0x08052bcc in MemoryContextAllocZero (context=0x0, size=128) at >>> mcxt.c:590 >>> #1 0x08051da1 in GTMProxy_ThreadAdd (thrinfo=0x807c538) at >>> proxy_thread.c:72 >>> #2 0x0804ac02 in BaseInit () at proxy_main.c:279 >>> #3 0x0804bd3f in main (argc=3, argv=0xbffff564) at proxy_main.c:836 >>> >>> Please note that the line 590 of mcxt.c maps to line 585 in the original >>> mcxt.c (wich is distributed in the v1.0.0 archive). >>> You must have an instance of GTM running before starting gtm_proxy, >>> otherwise the segfault won't occur. >>> >>> >>> Plexo >>> >>> >>> >>> 2012/6/18 Koichi Suzuki <koi...@gm...> >>>> >>>> I'm trying to fix this though it is only for 32bit. >>>> >>>> Because it may take a bit for me to build 32bit environment, If you >>>> have core file of GTM crash, it's very helpful if you send be back trace of >>>> the core (bt command) by gdb. I hope back trace will pinpoint the cause of >>>> the bug. >>>> >>>> Best Regards; >>>> ---------- >>>> Koichi Suzuki >>>> >>>> >>>> >>>> 2012/6/17 plexo rama <ple...@gm...> >>>>> >>>>> Suzuki-san, >>>>> >>>>> it seems the problem only occurs on 32bit systems. >>>>> >>>>> I've compiled the source using CLANG, gcc-4.4 & gcc-4.6 on Ubuntu >>>>> 10.04.2 LTS 32bit system running in a virtual machine hosted on linode.com. >>>>> >>>>> The result was the same each try. I've tried it with the latest >>>>> 1.0.0-beta package as well as v0.9.7. >>>>> >>>>> When starting gtm_proxy on a 32bit system in gdb I always receive >>>>> >>>>> Program received signal SIGSEGV, Segmentation fault. >>>>> 0x080627ed in MemoryContextAllocZero (context=0x0, size=128) at >>>>> mcxt.c:585 >>>>> 585 ret = (*context->methods->alloc) (context, size); >>>>> >>>>> >>>>> However, I've had success compiling & running the code / gtm_proxy on a >>>>> 64bit system running Ubuntu 10.04.2 LTS 64bit. >>>>> >>>>> I hope that helps? >>>>> >>>>> I'm not sure whether it make sense to look further in to that issue as >>>>> 32bit environments don't really make sense as a production system, except >>>>> for testing. >>>>> Although, it would be good to know, what the issue really is. >>>>> >>>>> >>>>> Plexo >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> 2012/6/16 Koichi Suzuki <koi...@gm...> >>>>>> >>>>>> Plexo; >>>>>> >>>>>> Thanks a lot for the report. I will look into it when back to Japan >>>>>> (now I'm in Beijing). >>>>>> >>>>>> To reproduce the problems, could you let me know your configuration >>>>>> (port and hosts of each component, including GTM, GTM_Proxy, Coordinator and >>>>>> datanodes) and how to reproduce the problem? >>>>>> >>>>>> Also, could you send me bt of the core file? >>>>>> >>>>>> Best Regards; >>>>>> ---------- >>>>>> Koichi Suzuki >>>>>> >>>>>> >>>>>> >>>>>> 2012/6/16 plexo rama <ple...@gm...> >>>>>>> >>>>>>> Suzuki-san, >>>>>>> >>>>>>> I've received another segfault in gtm_proxy >>>>>>> >>>>>>> Program received signal SIGSEGV, Segmentation fault. >>>>>>> [Switching to Thread 0xb7e56b70 (LWP 10493)] >>>>>>> 0x0806961f in errfinish (dummy=0) at elog.c:320 >>>>>>> 320 EmitErrorReport(MyPort); >>>>>>> >>>>>>> >>>>>>> it looks like elog.c also requires a special handling, >>>>>>> as MyPort-macro uses GetMyThreadInfo and thus refers >>>>>>> to GTM_ThreadInfo instead of GTMProxy_ThreadInfo >>>>>>> >>>>>>> 2012/6/15 Koichi Suzuki <koi...@gm...> >>>>>>>> >>>>>>>> Yes, I will do it. >>>>>>>> >>>>>>>> --- >>>>>>>> Koichi Suzuki >>>>>>>> >>>>>>>> On 2012/06/15, at 8:07, Michael Paquier <mic...@gm...> >>>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Jun 15, 2012 at 7:37 AM, Koichi Suzuki >>>>>>>> <koi...@gm...> wrote: >>>>>>>>> >>>>>>>>> Thanks for the pointing out. Another requirement was to make >>>>>>>>> mcxt.o >>>>>>>>> shared among gtm and gtm_proxy. I will look check if this >>>>>>>>> requirement makes sense now (it did, when very first pgxc_clean was >>>>>>>>> implemented, which was rewritten at V 1.0). >>>>>>>>> >>>>>>>>> I'm afraid the cause of NULL pointer is different. >>>>>>>> >>>>>>>> Suzuki-san, >>>>>>>> >>>>>>>> Could you sort that out with plexo and review any patch he sends? >>>>>>>> You know this area of the code pretty well so I believe you are >>>>>>>> well-suited here. >>>>>>>> >>>>>>>> Regards, >>>>>>>> -- >>>>>>>> Michael Paquier >>>>>>>> http://michael.otacoo.com >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> |
From: Koichi S. <koi...@gm...> - 2012-06-20 03:12:30
|
Somehow, this thread run in private, which is not good at all. Gtm_proxy crash in 32bit environment has been discussed between me and Plexo Rama. Now I added this to the bug tracker with ID 3536469. So far, in 32bit environment, I found that thr_thread_context and thr_current_context are not set properly. They're set to NULL. On the other hand, in 64bit, all the thread information members are set to proper values. Now finding what made this difference. Regards; ---------- Koichi Suzuki 2012/6/19 Koichi Suzuki <koi...@gm...> > > Hi, > > The problem is MemoryContextAllocZero receives NULL MemoryContext, which > shall be CurrentMemoryContext. palloc0() does this and > CurrentMemoryContext should have been set in BaseInit(). > > It's very straightforward and there're no struct involved and I need to > run it in the 32bit environment to see what is going on really. > > Anyway, this information saved much of my time. > > Thank you very much; > ---------- > Koichi Suzuki > > > > 2012/6/18 plexo rama <ple...@gm...> >> >> Suzuki-san, >> >> this is the output of the backtrace command: >> >> #0 0x08052bcc in MemoryContextAllocZero (context=0x0, size=128) at >> mcxt.c:590 >> #1 0x08051da1 in GTMProxy_ThreadAdd (thrinfo=0x807c538) at >> proxy_thread.c:72 >> #2 0x0804ac02 in BaseInit () at proxy_main.c:279 >> #3 0x0804bd3f in main (argc=3, argv=0xbffff564) at proxy_main.c:836 >> >> Please note that the line 590 of mcxt.c maps to line 585 in the original >> mcxt.c (wich is distributed in the v1.0.0 archive). >> You must have an instance of GTM running before starting gtm_proxy, >> otherwise the segfault won't occur. >> >> >> Plexo >> >> >> >> 2012/6/18 Koichi Suzuki <koi...@gm...> >>> >>> I'm trying to fix this though it is only for 32bit. >>> >>> Because it may take a bit for me to build 32bit environment, If you >>> have core file of GTM crash, it's very helpful if you send be back trace of >>> the core (bt command) by gdb. I hope back trace will pinpoint the cause of >>> the bug. >>> >>> Best Regards; >>> ---------- >>> Koichi Suzuki >>> >>> >>> >>> 2012/6/17 plexo rama <ple...@gm...> >>>> >>>> Suzuki-san, >>>> >>>> it seems the problem only occurs on 32bit systems. >>>> >>>> I've compiled the source using CLANG, gcc-4.4 & gcc-4.6 on Ubuntu >>>> 10.04.2 LTS 32bit system running in a virtual machine hosted on linode.com. >>>> >>>> The result was the same each try. I've tried it with the latest >>>> 1.0.0-beta package as well as v0.9.7. >>>> >>>> When starting gtm_proxy on a 32bit system in gdb I always receive >>>> >>>> Program received signal SIGSEGV, Segmentation fault. >>>> 0x080627ed in MemoryContextAllocZero (context=0x0, size=128) at >>>> mcxt.c:585 >>>> 585 ret = (*context->methods->alloc) (context, size); >>>> >>>> >>>> However, I've had success compiling & running the code / gtm_proxy on a >>>> 64bit system running Ubuntu 10.04.2 LTS 64bit. >>>> >>>> I hope that helps? >>>> >>>> I'm not sure whether it make sense to look further in to that issue as >>>> 32bit environments don't really make sense as a production system, except >>>> for testing. >>>> Although, it would be good to know, what the issue really is. >>>> >>>> >>>> Plexo >>>> >>>> >>>> >>>> >>>> >>>> 2012/6/16 Koichi Suzuki <koi...@gm...> >>>>> >>>>> Plexo; >>>>> >>>>> Thanks a lot for the report. I will look into it when back to Japan >>>>> (now I'm in Beijing). >>>>> >>>>> To reproduce the problems, could you let me know your configuration >>>>> (port and hosts of each component, including GTM, GTM_Proxy, Coordinator and >>>>> datanodes) and how to reproduce the problem? >>>>> >>>>> Also, could you send me bt of the core file? >>>>> >>>>> Best Regards; >>>>> ---------- >>>>> Koichi Suzuki >>>>> >>>>> >>>>> >>>>> 2012/6/16 plexo rama <ple...@gm...> >>>>>> >>>>>> Suzuki-san, >>>>>> >>>>>> I've received another segfault in gtm_proxy >>>>>> >>>>>> Program received signal SIGSEGV, Segmentation fault. >>>>>> [Switching to Thread 0xb7e56b70 (LWP 10493)] >>>>>> 0x0806961f in errfinish (dummy=0) at elog.c:320 >>>>>> 320 EmitErrorReport(MyPort); >>>>>> >>>>>> >>>>>> it looks like elog.c also requires a special handling, >>>>>> as MyPort-macro uses GetMyThreadInfo and thus refers >>>>>> to GTM_ThreadInfo instead of GTMProxy_ThreadInfo >>>>>> >>>>>> 2012/6/15 Koichi Suzuki <koi...@gm...> >>>>>>> >>>>>>> Yes, I will do it. >>>>>>> >>>>>>> --- >>>>>>> Koichi Suzuki >>>>>>> >>>>>>> On 2012/06/15, at 8:07, Michael Paquier <mic...@gm...> >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Jun 15, 2012 at 7:37 AM, Koichi Suzuki >>>>>>> <koi...@gm...> wrote: >>>>>>>> >>>>>>>> Thanks for the pointing out. Another requirement was to make >>>>>>>> mcxt.o >>>>>>>> shared among gtm and gtm_proxy. I will look check if this >>>>>>>> requirement makes sense now (it did, when very first pgxc_clean was >>>>>>>> implemented, which was rewritten at V 1.0). >>>>>>>> >>>>>>>> I'm afraid the cause of NULL pointer is different. >>>>>>> >>>>>>> Suzuki-san, >>>>>>> >>>>>>> Could you sort that out with plexo and review any patch he sends? >>>>>>> You know this area of the code pretty well so I believe you are >>>>>>> well-suited here. >>>>>>> >>>>>>> Regards, >>>>>>> -- >>>>>>> Michael Paquier >>>>>>> http://michael.otacoo.com >>>>>> >>>>>> >>>>> >>>> >>> >> > |
From: Michael P. <mic...@gm...> - 2012-06-20 00:58:26
|
Hi, Just a comment on this thread... You are not forwarding messages to the hackers ML since a couple of emails. So I am putting that back on tracks to allow people seing the message history. They are included in this message below. Thanks. On Tue, Jun 19, 2012 at 5:30 PM, Koichi Suzuki <koi...@gm...>wrote: > Okay. I'm building my 32-bit environment and will look into it. Sorry I > have to find some time slot for this... > ---------- > Koichi Suzuki > > > > 2012/6/19 plexo rama <ple...@gm...> > >> Suzuki-San, >> >> I guess you didn't read my initial mail from 14th June, which states >> exactly the same analysis. However, as CurrentMemoryContext is involved, >> which is a MACRO and thus resolving to GetMyThreadInfo, which in turn >> references GTM_ThreadInfo, there is a structure involved in this. >> >> Actually, CurrentMemoryContext used by palloc0() returns >> 0x0 CurrentMemoryContext as thr_current_context member of GTM_ThreadInfo >> contains 0x0, however, examinig the structure returned by GetMyThreadInfo, >> reveals that thr_message_context contains a value after BaseInit() / >> MemoryContextInit() is called, which should not be because >> thr_message_context is not initialized in MemoryContextInit(). Thus my >> conclusion: the structure used by CurrentMemoryContext, TopMemoryContext & >> ErrorContext used in MemoryContextInit() points to the wrong memory >> location (off-by-1 so to say). >> >> I'm glad I could help to sched some light on this. >> >> Plexo >> >> >> >> 2012/6/19 Koichi Suzuki <koi...@gm...> >> >>> Hi, >>> >>> The problem is MemoryContextAllocZero receives NULL MemoryContext, which >>> shall be CurrentMemoryContext. palloc0() does this and >>> CurrentMemoryContext should have been set in BaseInit(). >>> >>> It's very straightforward and there're no struct involved and I need to >>> run it in the 32bit environment to see what is going on really. >>> >>> Anyway, this information saved much of my time. >>> >>> Thank you very much; >>> ---------- >>> Koichi Suzuki >>> >>> >>> >>> 2012/6/18 plexo rama <ple...@gm...> >>> >>>> Suzuki-san, >>>> >>>> this is the output of the backtrace command: >>>> * >>>> * >>>> *#0 0x08052bcc in MemoryContextAllocZero (context=0x0, size=128) at >>>> mcxt.c:590* >>>> *#1 0x08051da1 in GTMProxy_ThreadAdd (thrinfo=0x807c538) at >>>> proxy_thread.c:72* >>>> *#2 0x0804ac02 in BaseInit () at proxy_main.c:279* >>>> *#3 0x0804bd3f in main (argc=3, argv=0xbffff564) at proxy_main.c:836* >>>> >>>> Please note that the line *590 of mcxt.c maps to line 585 in the >>>> original mcxt.c* (wich is distributed in the v1.0.0 archive). >>>> You must have an instance of GTM running before starting gtm_proxy, >>>> otherwise the segfault won't occur. >>>> >>>> >>>> Plexo >>>> >>>> >>>> >>>> 2012/6/18 Koichi Suzuki <koi...@gm...> >>>> >>>>> I'm trying to fix this though it is only for 32bit. >>>>> >>>>> Because it may take a bit for me to build 32bit environment, If you >>>>> have core file of GTM crash, it's very helpful if you send be back trace of >>>>> the core (bt command) by gdb. I hope back trace will pinpoint the cause >>>>> of the bug. >>>>> >>>>> Best Regards; >>>>> ---------- >>>>> Koichi Suzuki >>>>> >>>>> >>>>> >>>>> 2012/6/17 plexo rama <ple...@gm...> >>>>> >>>>>> Suzuki-san, >>>>>> >>>>>> it seems the problem only occurs on 32bit systems. >>>>>> >>>>>> I've compiled the source using CLANG, gcc-4.4 & gcc-4.6 on Ubuntu >>>>>> 10.04.2 LTS 32bit system running in a virtual machine hosted on >>>>>> linode.com. >>>>>> >>>>>> The result was the same each try. I've tried it with the latest >>>>>> 1.0.0-beta package as well as v0.9.7. >>>>>> >>>>>> When starting gtm_proxy on a 32bit system in gdb I always receive >>>>>> >>>>>> Program received signal SIGSEGV, Segmentation fault. >>>>>> 0x080627ed in MemoryContextAllocZero (context=0x0, size=128) at >>>>>> mcxt.c:585 >>>>>> 585 ret = (*context->methods->alloc) (context, size); >>>>>> >>>>>> >>>>>> However, I've had success compiling & running the code / gtm_proxy on >>>>>> a 64bit system running Ubuntu 10.04.2 LTS 64bit. >>>>>> >>>>>> I hope that helps? >>>>>> >>>>>> I'm not sure whether it make sense to look further in to that issue >>>>>> as 32bit environments don't really make sense as a production system, >>>>>> except for testing. >>>>>> Although, it would be good to know, what the issue really is. >>>>>> >>>>>> >>>>>> Plexo >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> 2012/6/16 Koichi Suzuki <koi...@gm...> >>>>>> >>>>>>> Plexo; >>>>>>> >>>>>>> Thanks a lot for the report. I will look into it when back to >>>>>>> Japan (now I'm in Beijing). >>>>>>> >>>>>>> To reproduce the problems, could you let me know your configuration >>>>>>> (port and hosts of each component, including GTM, GTM_Proxy, Coordinator >>>>>>> and datanodes) and how to reproduce the problem? >>>>>>> >>>>>>> Also, could you send me bt of the core file? >>>>>>> >>>>>>> Best Regards; >>>>>>> ---------- >>>>>>> Koichi Suzuki >>>>>>> >>>>>>> >>>>>>> >>>>>>> 2012/6/16 plexo rama <ple...@gm...> >>>>>>> >>>>>>>> Suzuki-san, >>>>>>>> >>>>>>>> I've received another segfault in gtm_proxy >>>>>>>> >>>>>>>> Program received signal SIGSEGV, Segmentation fault. >>>>>>>> [Switching to Thread 0xb7e56b70 (LWP 10493)] >>>>>>>> 0x0806961f in errfinish (dummy=0) at elog.c:320 >>>>>>>> 320 EmitErrorReport(MyPort); >>>>>>>> >>>>>>>> >>>>>>>> it looks like elog.c also requires a special handling, >>>>>>>> as *MyPort-macro* uses *GetMyThreadInfo *and thus refers >>>>>>>> to GTM_ThreadInfo* instead of GTMProxy_ThreadInfo* >>>>>>>> >>>>>>>> 2012/6/15 Koichi Suzuki <koi...@gm...> >>>>>>>> >>>>>>>>> Yes, I will do it. >>>>>>>>> >>>>>>>>> --- >>>>>>>>> Koichi Suzuki >>>>>>>>> >>>>>>>>> On 2012/06/15, at 8:07, Michael Paquier <mic...@gm...> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Jun 15, 2012 at 7:37 AM, Koichi Suzuki < >>>>>>>>> koi...@gm...> wrote: >>>>>>>>> >>>>>>>>>> Thanks for the pointing out. Another requirement was to make >>>>>>>>>> mcxt.o >>>>>>>>>> shared among gtm and gtm_proxy. I will look check if this >>>>>>>>>> requirement makes sense now (it did, when very first pgxc_clean >>>>>>>>>> was >>>>>>>>>> implemented, which was rewritten at V 1.0). >>>>>>>>>> >>>>>>>>>> I'm afraid the cause of NULL pointer is different. >>>>>>>>>> >>>>>>>>> Suzuki-san, >>>>>>>>> >>>>>>>>> Could you sort that out with plexo and review any patch he sends? >>>>>>>>> You know this area of the code pretty well so I believe you are >>>>>>>>> well-suited here. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> -- >>>>>>>>> Michael Paquier >>>>>>>>> http://michael.otacoo.com >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > -- Michael Paquier http://michael.otacoo.com |
From: Michael P. <mic...@gm...> - 2012-06-20 00:29:46
|
In this patch, the SQL/catalog management and the distribution mechanism use really separated APIs. So even if I do not think it is necessary to change the SQL part, the redistribution mechanism can be changed at will. For the time being, the redistribution mechanism is not really performant (well, it was not the goal of this prototype), because it uses the following model. 1) Creation of a storage table (unlogged) with default distribution (CTAS) 2) Take necessary locks on storage and redistributed table 3) Delete all data on redistributed table 4) Update catalogs with new distribution information 5) Perform INSERT SELECT from storage table to redistributed table 6) DROP storage table As mentionned by Ashutosh, the data needs to travel 4 times through network so it can really take a lot of time for tables with lots of gigs of data. Network in itself is not the bottleneck, it is the usage of the framework of postgres. This is especially true when queries used by redistribution mechanism cannot be pushed down. The worst case being when a table is redistributed to a hash/modulo on multiple nodes, as in this case it is necessary to plan each INSERT for the redistribution. As mentionned also by Ashutosh this can create a huge deal of xlogs on remote Datanodes, not really welcome after a crash recovery. There are several ways possible to improve this dramatically improve the redistribution mechanism. Here are 3 ideas. 1) Create storage table with data on a single node Here we reduce the load on network, but it cannot solve the problem of tables redistributed to modulo/hash on multiple nodes. It will create a lot of INSERT queries for a slow result. 2) Use a COPY mechanism One of the simple solutions. Instead of using a costly storage table in cluster, store the data on Coordinator during the redistribution a) COPY the data of table being redistributed to a file in $PGDATA of Coordinator. Why not $PGDATA/pg_distrib/oid? b) DELETE all the data on table being redistributed c) update catalogs d) COPY FROM file to table with new distribution type Network load is halved. COPY is also really faster. Servers of Coordinator are not chosen for there disk I/O but the folder $PGDATA/pg_distrib could be linked to a folder where a faster disk is mounted This also gets rid of the storage table. The only thing to care of is the deletion of the temporary data file once redistribution transaction commits or aborts. Data file could also be compressed to reduce space consumed and I/O on disk. 3) Use a batching process to communicate only necessary tuples from Datanodes to Coordinator. Suggested by Ashutosh, this can use COPY protocol to redistribute in a batch way the tuples being redistributed. The idea is to send from Datanodes to Coordinator only the tuples that need to be redistributed, and then let Coordinator redistribute correctly all the data depending on the new distribution. This avoids to have to store temporarily the data redistributed and all the transfer is managed by cache on Coordinator. This idea has a couple of limitations though: - a Datanode is not aware of the existence of the other nodes in cluster. Now distribution data is only available at Coordinator on catalog pgxc_class, and this distribution data contains the list of nodes where data is distributed. This is directly dependant on catalog pgxc_node. So a Datanode cannot know if a tuple will be at the correct place or not. This could be countered by allowing the run of node DDLs on Datanodes, but this adds an additional constraint on cluster setting as it forces the cluster designer to update all the pgxc_node catalogs on all the nodes. Having a pgxc_node catalog on Datanode would make sense if it communicates with other nodes through the same pooler as Coordinator, but this also raises issues with multiple backends open on one node for the same session, which is dangerous for transaction handling. - visibility concerns. What insures that a tuple has been only selected once. As redistribution is a cluster-based mechanism. What can insure that a scan on a Datanode is not taking into account some tuples that have already been redistributed. Method 1 looks useless from the point of performance. Method 2 should have a good performance. This only point is that data has to be located on Coordinator server temporarily while redistribution is being done. We could also use some GUC parameter to allow DBA to customize the way redistribution data folder is stored (compression type, file name format...). I have some concerns about method 3 as explained above. I might not take into account all the potential problems or have a limited view on this mechanism, but it introduces some new dependencies with cluster setting which may not be necessary. However any discussion on the subject is welcome. Suggestions are welcome. On Wed, Jun 20, 2012 at 8:40 AM, Michael Paquier <mic...@gm...>wrote: > > > On Wed, Jun 20, 2012 at 4:19 AM, Abbas Butt <abb...@en...>wrote: > >> You forgot to attach the patch. >> > Sorry here is the patch. > > > >> >> On Tue, Jun 19, 2012 at 10:58 AM, Michael Paquier < >> mic...@gm...> wrote: >> >>> Hi all, >>> >>> Please find attached an improved patch. I corrected the following points: >>> - Storage table uses an access exclusive lock meaning it cannot be >>> accessed by other sessions in cluster >>> - The table redistributed uses an exclusive lock, it can be accessed by >>> the other sessions in cluster with SELECT while redistribution is running >>> - Addition of an API to manage table locking >>> - Correction of bugs regarding session concurrency. An update in >>> pgxc_class (update of distribution data) was not seen by concurrent >>> sessions in cluster. >>> - doc correction and completion >>> - regression fixes due to grammar change for node list in CTAS, CREATE >>> TABLE, EXECUTE DIRECT and CLEAN CONNECTION >>> - Fix of system functions using EXECUTE direct >>> - Fix for CTAS query generation >>> - update index of catalog pgxc_class updated >>> - Correct update for relation cache when location data is updated >>> >>> Questions are welcome. >>> This patch can be applied on master and works as expected. >>> >>> On Mon, Jun 18, 2012 at 5:25 PM, Michael Paquier < >>> mic...@gm...> wrote: >>> >>>> Hi all, >>>> >>>> Based on the design above, I went to the end of my idea and took a day >>>> to write a prototype for online redistribution based on ALTER TABLE. >>>> It uses the grammar written in previous mail with ADD NODE/DELETE >>>> NODE/DISTRIBUTE BY/TO NODE | GROUP. >>>> >>>> The main idea is the use of what I call a "storage" table which is used >>>> as a temporary location for the data being distributed in cluster. >>>> This table is created as unlogged >>>> >>>> The patch sticks with the design invocated before; >>>> - Cached plans are dropped when redistribution is invocated >>>> - Vacuum is not necessary, this mechanism uses transaction-safe queries >>>> - for the time being, this implementation uses an exclusive lock, but >>>> as the redistribution is done, a ShareUpdateExclusive lock is not to >>>> exclude. >>>> - tables are reindexed if necessary. >>>> - redistribution cannot be done inside a transaction block >>>> - redistribution is not authorized with all the other commands as they >>>> are locally-safe on each node. >>>> - no restrictions on the distribution types, table types or subclusters >>>> >>>> This feature can be really improved for example in the case of >>>> replicated tables in particular, when the list of nodes of the table is >>>> changed. >>>> It is one of the things I would like to improve as it would really >>>> increase performance >>>> >>>> Regards, >>>> >>>> -- >>>> Michael Paquier >>>> http://michael.otacoo.com >>>> >>> >>> >>> >>> -- >>> Michael Paquier >>> http://michael.otacoo.com >>> >>> >>> ------------------------------------------------------------------------------ >>> Live Security Virtual Conference >>> Exclusive live event will cover all the ways today's security and >>> threat landscape has changed and how IT managers can respond. Discussions >>> will include endpoint security, mobile security and the latest in malware >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> _______________________________________________ >>> Postgres-xc-developers mailing list >>> Pos...@li... >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>> >>> >> >> >> -- >> -- >> Abbas >> Architect >> EnterpriseDB Corporation >> The Enterprise PostgreSQL Company >> >> Phone: 92-334-5100153 >> >> >> Website: www.enterprisedb.com >> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> Follow us on Twitter: http://www.twitter.com/enterprisedb >> >> This e-mail message (and any attachment) is intended for the use of >> the individual or entity to whom it is addressed. This message >> contains information from EnterpriseDB Corporation that may be >> privileged, confidential, or exempt from disclosure under applicable >> law. If you are not the intended recipient or authorized to receive >> this for the intended recipient, any use, dissemination, distribution, >> retention, archiving, or copying of this communication is strictly >> prohibited. If you have received this e-mail in error, please notify >> the sender immediately by reply e-mail and delete this message. >> > > > > -- > Michael Paquier > http://michael.otacoo.com > -- Michael Paquier http://michael.otacoo.com |
From: Abbas B. <abb...@en...> - 2012-06-19 19:19:34
|
You forgot to attach the patch. On Tue, Jun 19, 2012 at 10:58 AM, Michael Paquier <mic...@gm... > wrote: > Hi all, > > Please find attached an improved patch. I corrected the following points: > - Storage table uses an access exclusive lock meaning it cannot be > accessed by other sessions in cluster > - The table redistributed uses an exclusive lock, it can be accessed by > the other sessions in cluster with SELECT while redistribution is running > - Addition of an API to manage table locking > - Correction of bugs regarding session concurrency. An update in > pgxc_class (update of distribution data) was not seen by concurrent > sessions in cluster. > - doc correction and completion > - regression fixes due to grammar change for node list in CTAS, CREATE > TABLE, EXECUTE DIRECT and CLEAN CONNECTION > - Fix of system functions using EXECUTE direct > - Fix for CTAS query generation > - update index of catalog pgxc_class updated > - Correct update for relation cache when location data is updated > > Questions are welcome. > This patch can be applied on master and works as expected. > > On Mon, Jun 18, 2012 at 5:25 PM, Michael Paquier < > mic...@gm...> wrote: > >> Hi all, >> >> Based on the design above, I went to the end of my idea and took a day to >> write a prototype for online redistribution based on ALTER TABLE. >> It uses the grammar written in previous mail with ADD NODE/DELETE >> NODE/DISTRIBUTE BY/TO NODE | GROUP. >> >> The main idea is the use of what I call a "storage" table which is used >> as a temporary location for the data being distributed in cluster. >> This table is created as unlogged >> >> The patch sticks with the design invocated before; >> - Cached plans are dropped when redistribution is invocated >> - Vacuum is not necessary, this mechanism uses transaction-safe queries >> - for the time being, this implementation uses an exclusive lock, but as >> the redistribution is done, a ShareUpdateExclusive lock is not to exclude. >> - tables are reindexed if necessary. >> - redistribution cannot be done inside a transaction block >> - redistribution is not authorized with all the other commands as they >> are locally-safe on each node. >> - no restrictions on the distribution types, table types or subclusters >> >> This feature can be really improved for example in the case of replicated >> tables in particular, when the list of nodes of the table is changed. >> It is one of the things I would like to improve as it would really >> increase performance >> >> Regards, >> >> -- >> Michael Paquier >> http://michael.otacoo.com >> > > > > -- > Michael Paquier > http://michael.otacoo.com > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > -- -- Abbas Architect EnterpriseDB Corporation The Enterprise PostgreSQL Company Phone: 92-334-5100153 Website: www.enterprisedb.com EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Abbas B. <abb...@en...> - 2012-06-19 09:43:41
|
Thanks for your comments. On Tue, Jun 19, 2012 at 1:54 PM, Ashutosh Bapat < ash...@en...> wrote: > Hi Abbas, > I have few comments to make > 1. With this patch there are two variables for having command Id, that is > going to cause confusion and will be a maintenance burden, might be error > prone. Is it possible to use a single variable instead of two? Are you talking about receivedCommandId and currentCommandId? If yes, I would prefer not having a packet received from coordinator overwrite the currentCommandId at data node, because I am not 100% sure about the life time of currentCommandId, I might overwrite it before time. It would be safe to let currentCommandId as is unless we are compelled to get the next command ID, and have the received command id take priority at that time. > Right now there is some code which is specific to cursors in your patch. > If you can plug the coordinator command id somehow into currentCommandId, > you won't need that code and any other code which needs coordinator command > ID will be automatically taken care of. > That code is required to solve a problem. Consider this case when a coordinator received this transaction BEGIN; insert into tt1 values(1); declare c50 cursor for select * from tt1; insert into tt1 values(2); fetch all from c50; COMMIT; While sending select to the data node in response to a fetch we need to know what was the command ID of the declare cursor statement and we need to send that command ID to the data node for this particular fetch. This is the main idea behind this solution. The first insert goes to the data node with command id 0, the second insert goes with 2. Command ID 1 is consumed by declare cursor. When coordinator sees fetch it needs to send select to the data node with command ID 1 rather than 3. > 2. A non-transaction on coordinator can spawn tranasactions on datanode or > subtransactions (if there is already a transaction running). Does your > patch handle that case? No and it does not need to, because that case has no known problems that we need to solve. I don't think my patch would impact any such case but I will analyze any failures that I may get in regressions. > Should we do more thorough research in the transaction management, esp. to > see the impact of getting same command id for two commands on the datanode? > If we issue two commands with the same command ID then we will definitely have visibility issues according to the rules I have already explained. But we will not have two commands sent to the data node with same command id. > > > On Tue, Jun 19, 2012 at 1:56 PM, Abbas Butt <abb...@en...>wrote: > >> Hi Ashutosh, >> Here are the results with the val column, Thanks. >> >> test=# drop table mvcc_demo; >> DROP TABLE >> test=# >> test=# create table mvcc_demo (val int); >> CREATE TABLE >> test=# >> test=# TRUNCATE mvcc_demo; >> TRUNCATE TABLE >> test=# >> test=# BEGIN; >> BEGIN >> test=# DELETE FROM mvcc_demo; -- increment command id to show that combo >> id would be different >> DELETE 0 >> test=# DELETE FROM mvcc_demo; >> DELETE 0 >> test=# DELETE FROM mvcc_demo; >> DELETE 0 >> test=# INSERT INTO mvcc_demo VALUES (1); >> INSERT 0 1 >> test=# INSERT INTO mvcc_demo VALUES (2); >> INSERT 0 1 >> test=# INSERT INTO mvcc_demo VALUES (3); >> INSERT 0 1 >> test=# SELECT t_xmin AS xmin, >> test-# t_xmax::text::int8 AS xmax, >> test-# t_field3::text::int8 AS cmin_cmax, >> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> is_combocid >> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> test-# ORDER BY 2 DESC, 3; >> xmin | xmax | cmin_cmax | is_combocid >> -------+------+-----------+------------- >> 80689 | 0 | 3 | f >> 80689 | 0 | 4 | f >> 80689 | 0 | 5 | f >> (3 rows) >> >> test=# >> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> xmin | xmax | cmin | cmax | val >> -------+------+------+------+----- >> 80689 | 0 | 3 | 3 | 1 >> 80689 | 0 | 4 | 4 | 2 >> 80689 | 0 | 5 | 5 | 3 >> >> (3 rows) >> >> test=# >> test=# DELETE FROM mvcc_demo; >> DELETE 3 >> test=# SELECT t_xmin AS xmin, >> test-# t_xmax::text::int8 AS xmax, >> test-# t_field3::text::int8 AS cmin_cmax, >> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> is_combocid >> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> test-# ORDER BY 2 DESC, 3; >> xmin | xmax | cmin_cmax | is_combocid >> -------+-------+-----------+------------- >> 80689 | 80689 | 0 | t >> 80689 | 80689 | 1 | t >> 80689 | 80689 | 2 | t >> (3 rows) >> >> test=# >> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> xmin | xmax | cmin | cmax | val >> ------+------+------+------+----- >> (0 rows) >> >> >> test=# >> test=# END; >> COMMIT >> test=# >> test=# >> test=# TRUNCATE mvcc_demo; >> TRUNCATE TABLE >> >> >> >> >> >> >> >> >> >> >> test=# BEGIN; >> BEGIN >> test=# INSERT INTO mvcc_demo VALUES (1); >> INSERT 0 1 >> test=# INSERT INTO mvcc_demo VALUES (2); >> INSERT 0 1 >> test=# INSERT INTO mvcc_demo VALUES (3); >> INSERT 0 1 >> test=# SELECT t_xmin AS xmin, >> test-# t_xmax::text::int8 AS xmax, >> test-# t_field3::text::int8 AS cmin_cmax, >> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> is_combocid >> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> test-# ORDER BY 2 DESC, 3; >> xmin | xmax | cmin_cmax | is_combocid >> -------+------+-----------+------------- >> 80693 | 0 | 0 | f >> 80693 | 0 | 1 | f >> 80693 | 0 | 2 | f >> (3 rows) >> >> test=# >> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> xmin | xmax | cmin | cmax | val >> -------+------+------+------+----- >> 80693 | 0 | 0 | 0 | 1 >> 80693 | 0 | 1 | 1 | 2 >> 80693 | 0 | 2 | 2 | 3 >> (3 rows) >> >> test=# >> test=# UPDATE mvcc_demo SET val = 10; >> >> UPDATE 3 >> test=# >> test=# SELECT t_xmin AS xmin, >> test-# t_xmax::text::int8 AS xmax, >> test-# t_field3::text::int8 AS cmin_cmax, >> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> is_combocid >> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> test-# ORDER BY 2 DESC, 3; >> xmin | xmax | cmin_cmax | is_combocid >> -------+-------+-----------+------------- >> 80693 | 80693 | 0 | t >> 80693 | 80693 | 1 | t >> 80693 | 80693 | 2 | t >> 80693 | 0 | 3 | f >> 80693 | 0 | 3 | f >> 80693 | 0 | 3 | f >> (6 rows) >> >> test=# >> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> xmin | xmax | cmin | cmax | val >> -------+------+------+------+----- >> 80693 | 0 | 3 | 3 | 10 >> 80693 | 0 | 3 | 3 | 10 >> 80693 | 0 | 3 | 3 | 10 >> (3 rows) >> >> >> test=# >> test=# END; >> COMMIT >> test=# >> test=# TRUNCATE mvcc_demo; >> TRUNCATE TABLE >> >> >> >> >> >> >> >> >> >> >> >> -- From one psql issue >> test=# INSERT INTO mvcc_demo VALUES (1); >> INSERT 0 1 >> test=# SELECT t_xmin AS xmin, >> test-# t_xmax::text::int8 AS xmax, >> test-# t_field3::text::int8 AS cmin_cmax, >> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> is_combocid >> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> test-# ORDER BY 2 DESC, 3; >> xmin | xmax | cmin_cmax | is_combocid >> -------+------+-----------+------------- >> 80699 | 0 | 0 | f >> (1 row) >> >> test=# >> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> xmin | xmax | cmin | cmax | val >> -------+------+------+------+----- >> 80699 | 0 | 0 | 0 | 1 >> (1 row) >> >> >> >> >> >> test=# -- From another issue >> test=# BEGIN; >> BEGIN >> test=# INSERT INTO mvcc_demo VALUES (2); >> INSERT 0 1 >> test=# INSERT INTO mvcc_demo VALUES (3); >> INSERT 0 1 >> test=# INSERT INTO mvcc_demo VALUES (4); >> INSERT 0 1 >> test=# SELECT t_xmin AS xmin, >> test-# t_xmax::text::int8 AS xmax, >> test-# t_field3::text::int8 AS cmin_cmax, >> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> is_combocid >> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> test-# ORDER BY 2 DESC, 3; >> xmin | xmax | cmin_cmax | is_combocid >> -------+------+-----------+------------- >> 80699 | 0 | 0 | f >> 80700 | 0 | 0 | f >> 80700 | 0 | 1 | f >> 80700 | 0 | 2 | f >> (4 rows) >> >> test=# >> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> xmin | xmax | cmin | cmax | val >> -------+------+------+------+----- >> 80699 | 0 | 0 | 0 | 1 >> 80700 | 0 | 0 | 0 | 2 >> 80700 | 0 | 1 | 1 | 3 >> 80700 | 0 | 2 | 2 | 4 >> (4 rows) >> >> test=# >> test=# UPDATE mvcc_demo SET val = 10; >> >> UPDATE 4 >> test=# >> test=# SELECT t_xmin AS xmin, >> test-# t_xmax::text::int8 AS xmax, >> test-# t_field3::text::int8 AS cmin_cmax, >> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> is_combocid >> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> test-# ORDER BY 2 DESC, 3; >> xmin | xmax | cmin_cmax | is_combocid >> -------+-------+-----------+------------- >> 80700 | 80700 | 0 | t >> 80700 | 80700 | 1 | t >> 80700 | 80700 | 2 | t >> 80699 | 80700 | 3 | f >> 80700 | 0 | 3 | f >> 80700 | 0 | 3 | f >> 80700 | 0 | 3 | f >> 80700 | 0 | 3 | f >> (8 rows) >> >> test=# >> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> xmin | xmax | cmin | cmax | val >> -------+------+------+------+----- >> 80700 | 0 | 3 | 3 | 10 >> 80700 | 0 | 3 | 3 | 10 >> 80700 | 0 | 3 | 3 | 10 >> 80700 | 0 | 3 | 3 | 10 >> (4 rows) >> >> >> >> >> test=# -- Before finishing this, issue these from the first psql >> test=# SELECT t_xmin AS xmin, >> test-# t_xmax::text::int8 AS xmax, >> test-# t_field3::text::int8 AS cmin_cmax, >> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> is_combocid >> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> test-# ORDER BY 2 DESC, 3; >> xmin | xmax | cmin_cmax | is_combocid >> -------+-------+-----------+------------- >> 80700 | 80700 | 0 | t >> 80700 | 80700 | 1 | t >> 80700 | 80700 | 2 | t >> 80699 | 80700 | 3 | f >> 80700 | 0 | 3 | f >> 80700 | 0 | 3 | f >> 80700 | 0 | 3 | f >> 80700 | 0 | 3 | f >> (8 rows) >> >> test=# >> test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; >> xmin | xmax | cmin | cmax | val >> -------+-------+------+------+----- >> 80699 | 80700 | 3 | 3 | 1 >> (1 row) >> >> test=# end; >> COMMIT >> >> >> On Tue, Jun 19, 2012 at 10:26 AM, Michael Paquier < >> mic...@gm...> wrote: >> >>> Hi, >>> >>> I expect pgxc_node_send_cmd_id to have some impact on performance, so be >>> sure to send it to remote Datanodes really only if necessary. >>> You should put more severe conditions blocking this function cid can >>> easily get incremented in Postgres. >>> >>> Regards, >>> >>> On Tue, Jun 19, 2012 at 5:31 AM, Abbas Butt <abb...@en... >>> > wrote: >>> >>>> PFA a WIP patch implementing the design presented earlier. >>>> The patch is WIP because it still has and FIXME and it shows some >>>> regression failures that need to be fixed, but other than that it confirms >>>> that the suggested design would work fine. The following test cases now >>>> work fine >>>> >>>> drop table tt1; >>>> create table tt1(f1 int) distribute by replication; >>>> >>>> >>>> BEGIN; >>>> insert into tt1 values(1); >>>> declare c50 cursor for select * from tt1; >>>> insert into tt1 values(2); >>>> fetch all from c50; >>>> COMMIT; >>>> truncate table tt1; >>>> >>>> BEGIN; >>>> >>>> declare c50 cursor for select * from tt1; >>>> insert into tt1 values(1); >>>> >>>> insert into tt1 values(2); >>>> fetch all from c50; >>>> COMMIT; >>>> truncate table tt1; >>>> >>>> >>>> BEGIN; >>>> insert into tt1 values(1); >>>> insert into tt1 values(2); >>>> >>>> declare c50 cursor for select * from tt1; >>>> insert into tt1 values(3); >>>> >>>> fetch all from c50; >>>> COMMIT; >>>> truncate table tt1; >>>> >>>> >>>> BEGIN; >>>> insert into tt1 values(1); >>>> declare c50 cursor for select * from tt1; >>>> insert into tt1 values(2); >>>> declare c51 cursor for select * from tt1; >>>> insert into tt1 values(3); >>>> fetch all from c50; >>>> fetch all from c51; >>>> COMMIT; >>>> truncate table tt1; >>>> >>>> >>>> BEGIN; >>>> insert into tt1 values(1); >>>> declare c50 cursor for select * from tt1; >>>> declare c51 cursor for select * from tt1; >>>> insert into tt1 values(2); >>>> insert into tt1 values(3); >>>> fetch all from c50; >>>> fetch all from c51; >>>> COMMIT; >>>> truncate table tt1; >>>> >>>> >>>> On Fri, Jun 15, 2012 at 8:07 AM, Abbas Butt < >>>> abb...@en...> wrote: >>>> >>>>> Hi, >>>>> >>>>> In a multi-statement transaction each statement is given a command >>>>> identifier >>>>> starting from zero and incrementing for each statement. >>>>> These command indentifers are required for extra tracking because each >>>>> statement has its own visibility rules with in the transaction. >>>>> For example, a cursor’s contents must remain unchanged even if later >>>>> statements in the >>>>> same transaction modify rows. Such tracking is implemented using >>>>> system command id >>>>> columns cmin/cmax, which is internally actually is a single column. >>>>> >>>>> cmin/cmax come into play in case of multi-statement transactions only, >>>>> they are both zero otherwise. >>>>> >>>>> cmin "The command identifier of the statement within the inserting >>>>> transaction." >>>>> cmax "The command identifier of the statement within the deleting >>>>> transaction." >>>>> >>>>> Here are the visibility rules (taken from comments of tqual.c) >>>>> >>>>> ( // A heap tuple is valid >>>>> "now" iff >>>>> Xmin == my-transaction && // inserted by the current >>>>> transaction >>>>> Cmin < my-command && // before this command, and >>>>> ( >>>>> Xmax is null || // the row has not been >>>>> deleted, or >>>>> ( >>>>> Xmax == my-transaction && // it was deleted by the >>>>> current transaction >>>>> Cmax >= my-command // but not before this >>>>> command, >>>>> ) >>>>> ) >>>>> ) >>>>> || // or >>>>> ( >>>>> Xmin is committed && // the row was inserted by a >>>>> committed transaction, and >>>>> ( >>>>> Xmax is null || // the row has not been >>>>> deleted, or >>>>> ( >>>>> Xmax == my-transaction && // the row is being deleted >>>>> by this transaction >>>>> Cmax >= my-command) || // but it's not deleted >>>>> "yet", or >>>>> ( >>>>> Xmax != my-transaction && // the row was deleted by >>>>> another transaction >>>>> Xmax is not committed // that has not been committed >>>>> ) >>>>> ) >>>>> ) >>>>> ) >>>>> >>>>> Because cmin and cmax are internally a single system column, >>>>> it is therefore not possible to simply record the status of a row >>>>> that is created and expired in the same multi-statement transaction. >>>>> For that reason, a special combo command id is created that references >>>>> a local memory hash that contains the actual cmin and cmax values. >>>>> It means that if combo id is being used the number we are seeing >>>>> would not be the cmin or cmax it will be an index into a local >>>>> array that contains a structure with has the actual cmin and cmax >>>>> values. >>>>> >>>>> The following queries (taken mostly from >>>>> http://momjian.us/main/writings/pgsql/mvcc.pdf) >>>>> use the contrib module pageinspect, which allows >>>>> visibility of internal heap page structures and all stored rows, >>>>> including those not visible in the current snapshot. >>>>> (Bit 0x0020 is defined as HEAP_COMBOCID.) >>>>> >>>>> We are exploring 3 examples here: >>>>> 1) INSERT & DELETE in a single transaction >>>>> 2) INSERT & UPDATE in a single transaction >>>>> 3) INSERT from two different transactions & UPDATE from one >>>>> >>>>> test=# drop table mvcc_demo; >>>>> DROP TABLE >>>>> test=# >>>>> test=# create table mvcc_demo (val int); >>>>> CREATE TABLE >>>>> test=# >>>>> test=# TRUNCATE mvcc_demo; >>>>> TRUNCATE TABLE >>>>> test=# >>>>> test=# BEGIN; >>>>> BEGIN >>>>> test=# DELETE FROM mvcc_demo; -- increment command id to show that >>>>> combo id would be different >>>>> DELETE 0 >>>>> test=# DELETE FROM mvcc_demo; >>>>> DELETE 0 >>>>> test=# DELETE FROM mvcc_demo; >>>>> DELETE 0 >>>>> test=# INSERT INTO mvcc_demo VALUES (1); >>>>> INSERT 0 1 >>>>> test=# INSERT INTO mvcc_demo VALUES (2); >>>>> INSERT 0 1 >>>>> test=# INSERT INTO mvcc_demo VALUES (3); >>>>> INSERT 0 1 >>>>> test=# SELECT t_xmin AS xmin, >>>>> test-# t_xmax::text::int8 AS xmax, >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>> is_combocid >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>> test-# ORDER BY 2 DESC, 3; >>>>> xmin | xmax | cmin_cmax | is_combocid >>>>> -------+------+-----------+------------- >>>>> 80685 | 0 | 3 | f >>>>> 80685 | 0 | 4 | f >>>>> 80685 | 0 | 5 | f >>>>> (3 rows) >>>>> >>>>> test=# >>>>> test=# DELETE FROM mvcc_demo; >>>>> DELETE 3 >>>>> test=# SELECT t_xmin AS xmin, >>>>> test-# t_xmax::text::int8 AS xmax, >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>> is_combocid >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>> test-# ORDER BY 2 DESC, 3; >>>>> xmin | xmax | cmin_cmax | is_combocid >>>>> -------+-------+-----------+------------- >>>>> 80685 | 80685 | 0 | t >>>>> 80685 | 80685 | 1 | t >>>>> 80685 | 80685 | 2 | t >>>>> (3 rows) >>>>> >>>>> Note that since is_combocid is true the numbers are not cmin/cmax they >>>>> are actually >>>>> the indexes of the internal array already explained above. >>>>> combo id index 0 would contain cmin 3, cmax 6 >>>>> combo id index 1 would contain cmin 4, cmax 6 >>>>> combo id index 2 would contain cmin 5, cmax 6 >>>>> >>>>> test=# >>>>> test=# END; >>>>> COMMIT >>>>> test=# >>>>> test=# >>>>> test=# TRUNCATE mvcc_demo; >>>>> TRUNCATE TABLE >>>>> test=# >>>>> test=# >>>>> test=# >>>>> test=# BEGIN; >>>>> BEGIN >>>>> test=# INSERT INTO mvcc_demo VALUES (1); >>>>> INSERT 0 1 >>>>> test=# INSERT INTO mvcc_demo VALUES (2); >>>>> INSERT 0 1 >>>>> test=# INSERT INTO mvcc_demo VALUES (3); >>>>> INSERT 0 1 >>>>> test=# SELECT t_xmin AS xmin, >>>>> test-# t_xmax::text::int8 AS xmax, >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>> is_combocid >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>> test-# ORDER BY 2 DESC, 3; >>>>> xmin | xmax | cmin_cmax | is_combocid >>>>> -------+------+-----------+------------- >>>>> 80675 | 0 | 0 | f >>>>> 80675 | 0 | 1 | f >>>>> 80675 | 0 | 2 | f >>>>> (3 rows) >>>>> >>>>> test=# >>>>> test=# UPDATE mvcc_demo SET val = val * 10; >>>>> UPDATE 3 >>>>> test=# >>>>> test=# SELECT t_xmin AS xmin, >>>>> test-# t_xmax::text::int8 AS xmax, >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>> is_combocid >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>> test-# ORDER BY 2 DESC, 3; >>>>> xmin | xmax | cmin_cmax | is_combocid >>>>> -------+-------+-----------+------------- >>>>> 80675 | 80675 | 0 | t >>>>> 80675 | 80675 | 1 | t >>>>> 80675 | 80675 | 2 | t >>>>> 80675 | 0 | 3 | f >>>>> 80675 | 0 | 3 | f >>>>> 80675 | 0 | 3 | f >>>>> (6 rows) >>>>> >>>>> test=# >>>>> test=# END; >>>>> COMMIT >>>>> test=# >>>>> test=# >>>>> test=# TRUNCATE mvcc_demo; >>>>> TRUNCATE TABLE >>>>> test=# >>>>> >>>>> -- From one psql issue >>>>> test=# INSERT INTO mvcc_demo VALUES (1); >>>>> INSERT 0 1 >>>>> test=# SELECT t_xmin AS xmin, >>>>> test-# t_xmax::text::int8 AS xmax, >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>> is_combocid >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>> test-# ORDER BY 2 DESC, 3; >>>>> xmin | xmax | cmin_cmax | is_combocid >>>>> -------+------+-----------+------------- >>>>> 80677 | 0 | 0 | f >>>>> (1 row) >>>>> >>>>> >>>>> test=# -- From another issue >>>>> test=# BEGIN; >>>>> BEGIN >>>>> test=# INSERT INTO mvcc_demo VALUES (2); >>>>> INSERT 0 1 >>>>> test=# INSERT INTO mvcc_demo VALUES (3); >>>>> INSERT 0 1 >>>>> test=# INSERT INTO mvcc_demo VALUES (4); >>>>> INSERT 0 1 >>>>> test=# SELECT t_xmin AS xmin, >>>>> test-# t_xmax::text::int8 AS xmax, >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>> is_combocid >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>> test-# ORDER BY 2 DESC, 3; >>>>> xmin | xmax | cmin_cmax | is_combocid >>>>> -------+------+-----------+------------- >>>>> 80677 | 0 | 0 | f >>>>> 80678 | 0 | 0 | f >>>>> 80678 | 0 | 1 | f >>>>> 80678 | 0 | 2 | f >>>>> (4 rows) >>>>> >>>>> test=# >>>>> test=# UPDATE mvcc_demo SET val = val * 10; >>>>> UPDATE 4 >>>>> test=# SELECT t_xmin AS xmin, >>>>> test-# t_xmax::text::int8 AS xmax, >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>> is_combocid >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>> test-# ORDER BY 2 DESC, 3; >>>>> xmin | xmax | cmin_cmax | is_combocid >>>>> -------+-------+-----------+------------- >>>>> 80678 | 80678 | 0 | t >>>>> 80678 | 80678 | 1 | t >>>>> 80678 | 80678 | 2 | t >>>>> 80677 | 80678 | 3 | f >>>>> 80678 | 0 | 3 | f >>>>> 80678 | 0 | 3 | f >>>>> 80678 | 0 | 3 | f >>>>> 80678 | 0 | 3 | f >>>>> (8 rows) >>>>> >>>>> test=# >>>>> >>>>> test=# -- Before finishing this, issue these from the first psql >>>>> test=# SELECT t_xmin AS xmin, >>>>> test-# t_xmax::text::int8 AS xmax, >>>>> test-# t_field3::text::int8 AS cmin_cmax, >>>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>>> is_combocid >>>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>>> test-# ORDER BY 2 DESC, 3; >>>>> xmin | xmax | cmin_cmax | is_combocid >>>>> -------+-------+-----------+------------- >>>>> 80678 | 80678 | 0 | t >>>>> 80678 | 80678 | 1 | t >>>>> 80678 | 80678 | 2 | t >>>>> 80677 | 80678 | 3 | f >>>>> 80678 | 0 | 3 | f >>>>> 80678 | 0 | 3 | f >>>>> 80678 | 0 | 3 | f >>>>> 80678 | 0 | 3 | f >>>>> (8 rows) >>>>> >>>>> test=# END; >>>>> COMMIT >>>>> >>>>> >>>>> Now consider the case we are trying to solve >>>>> >>>>> drop table tt1; >>>>> create table tt1(f1 int); >>>>> >>>>> BEGIN; >>>>> insert into tt1 values(1); >>>>> declare c50 cursor for select * from tt1; -- should show one row only >>>>> insert into tt1 values(2); >>>>> fetch all from c50; >>>>> COMMIT; >>>>> >>>>> >>>>> Consider Data node 1 log >>>>> >>>>> (a) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read >>>>> committed READ WRITE] >>>>> (b) [exec_simple_query][1026][drop table tt1;] >>>>> (c) [exec_simple_query][1026][PREPARE TRANSACTION 'T21075'] >>>>> (d) [exec_simple_query][1026][COMMIT PREPARED 'T21075'] >>>>> (e) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read >>>>> committed READ WRITE] >>>>> (f) [exec_simple_query][1026][create table tt1(f1 int);] >>>>> (g) [exec_simple_query][1026][PREPARE TRANSACTION 'T21077'] >>>>> (h) [exec_simple_query][1026][COMMIT PREPARED 'T21077'] >>>>> (i) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read >>>>> committed READ WRITE] >>>>> (j) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (1)] >>>>> (k) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (2)] >>>>> (l) [PostgresMain][4155][SELECT tt1.f1, tt1.ctid, pgxc_node_str() FROM >>>>> tt1] >>>>> (m) [exec_simple_query][1026][COMMIT TRANSACTION] >>>>> >>>>> The cursor currently shows both inserted rows because command id at >>>>> data node in >>>>> step (j) is 0 >>>>> step (k) is 1 & >>>>> step (l) is 2 >>>>> >>>>> Where as we need command ids to be >>>>> >>>>> step (j) should be 0 >>>>> step (k) should be 2 & >>>>> step (l) should be 1 >>>>> >>>>> This will solve the cursor visibility problem. >>>>> >>>>> To implement this I suggest we send command IDs to data nodes from the >>>>> coordinator >>>>> like we send gxid. The only difference will be that we do not need to >>>>> take command IDs >>>>> from GTM since they are only valid with in the transaction. >>>>> >>>>> See this example >>>>> >>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>> xmin | xmax | cmin | cmax | f1 >>>>> ------+------+------+------+---- >>>>> (0 rows) >>>>> >>>>> test=# begin; >>>>> BEGIN >>>>> test=# insert into tt1 values(1); >>>>> INSERT 0 1 >>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>> xmin | xmax | cmin | cmax | f1 >>>>> -------+------+------+------+---- >>>>> 80615 | 0 | 0 | 0 | 1 >>>>> (1 row) >>>>> >>>>> test=# insert into tt1 values(2); >>>>> INSERT 0 1 >>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>> xmin | xmax | cmin | cmax | f1 >>>>> -------+------+------+------+---- >>>>> 80615 | 0 | 0 | 0 | 1 >>>>> 80615 | 0 | 1 | 1 | 2 >>>>> (2 rows) >>>>> >>>>> test=# insert into tt1 values(3); >>>>> INSERT 0 1 >>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>> xmin | xmax | cmin | cmax | f1 >>>>> -------+------+------+------+---- >>>>> 80615 | 0 | 0 | 0 | 1 >>>>> 80615 | 0 | 1 | 1 | 2 >>>>> 80615 | 0 | 2 | 2 | 3 >>>>> (3 rows) >>>>> >>>>> test=# insert into tt1 values(4); >>>>> INSERT 0 1 >>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>> xmin | xmax | cmin | cmax | f1 >>>>> -------+------+------+------+---- >>>>> 80615 | 0 | 0 | 0 | 1 >>>>> 80615 | 0 | 1 | 1 | 2 >>>>> 80615 | 0 | 2 | 2 | 3 >>>>> 80615 | 0 | 3 | 3 | 4 >>>>> (4 rows) >>>>> >>>>> test=# end; >>>>> COMMIT >>>>> test=# >>>>> test=# >>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>> xmin | xmax | cmin | cmax | f1 >>>>> -------+------+------+------+---- >>>>> 80615 | 0 | 0 | 0 | 1 >>>>> 80615 | 0 | 1 | 1 | 2 >>>>> 80615 | 0 | 2 | 2 | 3 >>>>> 80615 | 0 | 3 | 3 | 4 >>>>> (4 rows) >>>>> >>>>> test=# insert into tt1 values(5); >>>>> INSERT 0 1 >>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>> xmin | xmax | cmin | cmax | f1 >>>>> -------+------+------+------+---- >>>>> 80615 | 0 | 0 | 0 | 1 >>>>> 80615 | 0 | 1 | 1 | 2 >>>>> 80615 | 0 | 2 | 2 | 3 >>>>> 80615 | 0 | 3 | 3 | 4 >>>>> 80616 | 0 | 0 | 0 | 5 >>>>> (5 rows) >>>>> >>>>> test=# insert into tt1 values(6); >>>>> INSERT 0 1 >>>>> test=# >>>>> test=# >>>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>>> xmin | xmax | cmin | cmax | f1 >>>>> -------+------+------+------+---- >>>>> 80615 | 0 | 0 | 0 | 1 >>>>> 80615 | 0 | 1 | 1 | 2 >>>>> 80615 | 0 | 2 | 2 | 3 >>>>> 80615 | 0 | 3 | 3 | 4 >>>>> 80616 | 0 | 0 | 0 | 5 >>>>> 80617 | 0 | 0 | 0 | 6 >>>>> (6 rows) >>>>> >>>>> Note that at the end of the multi-statement transaction the command id >>>>> gets reset to zero. >>>>> >>>>> -- >>>>> Abbas >>>>> Architect >>>>> EnterpriseDB Corporation >>>>> The Enterprise PostgreSQL Company >>>> >>>> >>>> >>>> >>>> -- >>>> -- >>>> Abbas >>>> Architect >>>> EnterpriseDB Corporation >>>> The Enterprise PostgreSQL Company >>>> >>>> Phone: 92-334-5100153 >>>> >>>> Website: www.enterprisedb.com >>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>>> >>>> This e-mail message (and any attachment) is intended for the use of >>>> the individual or entity to whom it is addressed. This message >>>> contains information from EnterpriseDB Corporation that may be >>>> privileged, confidential, or exempt from disclosure under applicable >>>> law. If you are not the intended recipient or authorized to receive >>>> this for the intended recipient, any use, dissemination, distribution, >>>> retention, archiving, or copying of this communication is strictly >>>> prohibited. If you have received this e-mail in error, please notify >>>> the sender immediately by reply e-mail and delete this message. >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Live Security Virtual Conference >>>> Exclusive live event will cover all the ways today's security and >>>> threat landscape has changed and how IT managers can respond. >>>> Discussions >>>> will include endpoint security, mobile security and the latest in >>>> malware >>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>> _______________________________________________ >>>> Postgres-xc-developers mailing list >>>> Pos...@li... >>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>> >>>> >>> >>> >>> -- >>> Michael Paquier >>> http://michael.otacoo.com >>> >> >> >> >> -- >> -- >> Abbas >> Architect >> EnterpriseDB Corporation >> The Enterprise PostgreSQL Company >> >> Phone: 92-334-5100153 >> >> Website: www.enterprisedb.com >> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> Follow us on Twitter: http://www.twitter.com/enterprisedb >> >> This e-mail message (and any attachment) is intended for the use of >> the individual or entity to whom it is addressed. This message >> contains information from EnterpriseDB Corporation that may be >> privileged, confidential, or exempt from disclosure under applicable >> law. If you are not the intended recipient or authorized to receive >> this for the intended recipient, any use, dissemination, distribution, >> retention, archiving, or copying of this communication is strictly >> prohibited. If you have received this e-mail in error, please notify >> the sender immediately by reply e-mail and delete this message. >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Postgres-xc-developers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >> > > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > > -- -- Abbas Architect EnterpriseDB Corporation The Enterprise PostgreSQL Company Phone: 92-334-5100153 Website: www.enterprisedb.com EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Ashutosh B. <ash...@en...> - 2012-06-19 08:54:47
|
Hi Abbas, I have few comments to make 1. With this patch there are two variables for having command Id, that is going to cause confusion and will be a maintenance burden, might be error prone. Is it possible to use a single variable instead of two? Right now there is some code which is specific to cursors in your patch. If you can plug the coordinator command id somehow into currentCommandId, you won't need that code and any other code which needs coordinator command ID will be automatically taken care of. 2. A non-transaction on coordinator can spawn tranasactions on datanode or subtransactions (if there is already a transaction running). Does your patch handle that case? Should we do more thorough research in the transaction management, esp. to see the impact of getting same command id for two commands on the datanode? On Tue, Jun 19, 2012 at 1:56 PM, Abbas Butt <abb...@en...>wrote: > Hi Ashutosh, > Here are the results with the val column, Thanks. > > test=# drop table mvcc_demo; > DROP TABLE > test=# > test=# create table mvcc_demo (val int); > CREATE TABLE > test=# > test=# TRUNCATE mvcc_demo; > TRUNCATE TABLE > test=# > test=# BEGIN; > BEGIN > test=# DELETE FROM mvcc_demo; -- increment command id to show that combo > id would be different > DELETE 0 > test=# DELETE FROM mvcc_demo; > DELETE 0 > test=# DELETE FROM mvcc_demo; > DELETE 0 > test=# INSERT INTO mvcc_demo VALUES (1); > INSERT 0 1 > test=# INSERT INTO mvcc_demo VALUES (2); > INSERT 0 1 > test=# INSERT INTO mvcc_demo VALUES (3); > INSERT 0 1 > test=# SELECT t_xmin AS xmin, > test-# t_xmax::text::int8 AS xmax, > test-# t_field3::text::int8 AS cmin_cmax, > test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid > test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > test-# ORDER BY 2 DESC, 3; > xmin | xmax | cmin_cmax | is_combocid > -------+------+-----------+------------- > 80689 | 0 | 3 | f > 80689 | 0 | 4 | f > 80689 | 0 | 5 | f > (3 rows) > > test=# > test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; > xmin | xmax | cmin | cmax | val > -------+------+------+------+----- > 80689 | 0 | 3 | 3 | 1 > 80689 | 0 | 4 | 4 | 2 > 80689 | 0 | 5 | 5 | 3 > > (3 rows) > > test=# > test=# DELETE FROM mvcc_demo; > DELETE 3 > test=# SELECT t_xmin AS xmin, > test-# t_xmax::text::int8 AS xmax, > test-# t_field3::text::int8 AS cmin_cmax, > test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid > test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > test-# ORDER BY 2 DESC, 3; > xmin | xmax | cmin_cmax | is_combocid > -------+-------+-----------+------------- > 80689 | 80689 | 0 | t > 80689 | 80689 | 1 | t > 80689 | 80689 | 2 | t > (3 rows) > > test=# > test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; > xmin | xmax | cmin | cmax | val > ------+------+------+------+----- > (0 rows) > > > test=# > test=# END; > COMMIT > test=# > test=# > test=# TRUNCATE mvcc_demo; > TRUNCATE TABLE > > > > > > > > > > > test=# BEGIN; > BEGIN > test=# INSERT INTO mvcc_demo VALUES (1); > INSERT 0 1 > test=# INSERT INTO mvcc_demo VALUES (2); > INSERT 0 1 > test=# INSERT INTO mvcc_demo VALUES (3); > INSERT 0 1 > test=# SELECT t_xmin AS xmin, > test-# t_xmax::text::int8 AS xmax, > test-# t_field3::text::int8 AS cmin_cmax, > test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid > test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > test-# ORDER BY 2 DESC, 3; > xmin | xmax | cmin_cmax | is_combocid > -------+------+-----------+------------- > 80693 | 0 | 0 | f > 80693 | 0 | 1 | f > 80693 | 0 | 2 | f > (3 rows) > > test=# > test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; > xmin | xmax | cmin | cmax | val > -------+------+------+------+----- > 80693 | 0 | 0 | 0 | 1 > 80693 | 0 | 1 | 1 | 2 > 80693 | 0 | 2 | 2 | 3 > (3 rows) > > test=# > test=# UPDATE mvcc_demo SET val = 10; > > UPDATE 3 > test=# > test=# SELECT t_xmin AS xmin, > test-# t_xmax::text::int8 AS xmax, > test-# t_field3::text::int8 AS cmin_cmax, > test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid > test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > test-# ORDER BY 2 DESC, 3; > xmin | xmax | cmin_cmax | is_combocid > -------+-------+-----------+------------- > 80693 | 80693 | 0 | t > 80693 | 80693 | 1 | t > 80693 | 80693 | 2 | t > 80693 | 0 | 3 | f > 80693 | 0 | 3 | f > 80693 | 0 | 3 | f > (6 rows) > > test=# > test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; > xmin | xmax | cmin | cmax | val > -------+------+------+------+----- > 80693 | 0 | 3 | 3 | 10 > 80693 | 0 | 3 | 3 | 10 > 80693 | 0 | 3 | 3 | 10 > (3 rows) > > > test=# > test=# END; > COMMIT > test=# > test=# TRUNCATE mvcc_demo; > TRUNCATE TABLE > > > > > > > > > > > > -- From one psql issue > test=# INSERT INTO mvcc_demo VALUES (1); > INSERT 0 1 > test=# SELECT t_xmin AS xmin, > test-# t_xmax::text::int8 AS xmax, > test-# t_field3::text::int8 AS cmin_cmax, > test-# (t_infomask::integer & X'0020'::integer)::bool AS > is_combocid > test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > test-# ORDER BY 2 DESC, 3; > xmin | xmax | cmin_cmax | is_combocid > -------+------+-----------+------------- > 80699 | 0 | 0 | f > (1 row) > > test=# > test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; > xmin | xmax | cmin | cmax | val > -------+------+------+------+----- > 80699 | 0 | 0 | 0 | 1 > (1 row) > > > > > > test=# -- From another issue > test=# BEGIN; > BEGIN > test=# INSERT INTO mvcc_demo VALUES (2); > INSERT 0 1 > test=# INSERT INTO mvcc_demo VALUES (3); > INSERT 0 1 > test=# INSERT INTO mvcc_demo VALUES (4); > INSERT 0 1 > test=# SELECT t_xmin AS xmin, > test-# t_xmax::text::int8 AS xmax, > test-# t_field3::text::int8 AS cmin_cmax, > test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid > test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > test-# ORDER BY 2 DESC, 3; > xmin | xmax | cmin_cmax | is_combocid > -------+------+-----------+------------- > 80699 | 0 | 0 | f > 80700 | 0 | 0 | f > 80700 | 0 | 1 | f > 80700 | 0 | 2 | f > (4 rows) > > test=# > test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; > xmin | xmax | cmin | cmax | val > -------+------+------+------+----- > 80699 | 0 | 0 | 0 | 1 > 80700 | 0 | 0 | 0 | 2 > 80700 | 0 | 1 | 1 | 3 > 80700 | 0 | 2 | 2 | 4 > (4 rows) > > test=# > test=# UPDATE mvcc_demo SET val = 10; > > UPDATE 4 > test=# > test=# SELECT t_xmin AS xmin, > test-# t_xmax::text::int8 AS xmax, > test-# t_field3::text::int8 AS cmin_cmax, > test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid > test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > test-# ORDER BY 2 DESC, 3; > xmin | xmax | cmin_cmax | is_combocid > -------+-------+-----------+------------- > 80700 | 80700 | 0 | t > 80700 | 80700 | 1 | t > 80700 | 80700 | 2 | t > 80699 | 80700 | 3 | f > 80700 | 0 | 3 | f > 80700 | 0 | 3 | f > 80700 | 0 | 3 | f > 80700 | 0 | 3 | f > (8 rows) > > test=# > test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; > xmin | xmax | cmin | cmax | val > -------+------+------+------+----- > 80700 | 0 | 3 | 3 | 10 > 80700 | 0 | 3 | 3 | 10 > 80700 | 0 | 3 | 3 | 10 > 80700 | 0 | 3 | 3 | 10 > (4 rows) > > > > > test=# -- Before finishing this, issue these from the first psql > test=# SELECT t_xmin AS xmin, > test-# t_xmax::text::int8 AS xmax, > test-# t_field3::text::int8 AS cmin_cmax, > test-# (t_infomask::integer & X'0020'::integer)::bool AS > is_combocid > test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > test-# ORDER BY 2 DESC, 3; > xmin | xmax | cmin_cmax | is_combocid > -------+-------+-----------+------------- > 80700 | 80700 | 0 | t > 80700 | 80700 | 1 | t > 80700 | 80700 | 2 | t > 80699 | 80700 | 3 | f > 80700 | 0 | 3 | f > 80700 | 0 | 3 | f > 80700 | 0 | 3 | f > 80700 | 0 | 3 | f > (8 rows) > > test=# > test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; > xmin | xmax | cmin | cmax | val > -------+-------+------+------+----- > 80699 | 80700 | 3 | 3 | 1 > (1 row) > > test=# end; > COMMIT > > > On Tue, Jun 19, 2012 at 10:26 AM, Michael Paquier < > mic...@gm...> wrote: > >> Hi, >> >> I expect pgxc_node_send_cmd_id to have some impact on performance, so be >> sure to send it to remote Datanodes really only if necessary. >> You should put more severe conditions blocking this function cid can >> easily get incremented in Postgres. >> >> Regards, >> >> On Tue, Jun 19, 2012 at 5:31 AM, Abbas Butt <abb...@en...>wrote: >> >>> PFA a WIP patch implementing the design presented earlier. >>> The patch is WIP because it still has and FIXME and it shows some >>> regression failures that need to be fixed, but other than that it confirms >>> that the suggested design would work fine. The following test cases now >>> work fine >>> >>> drop table tt1; >>> create table tt1(f1 int) distribute by replication; >>> >>> >>> BEGIN; >>> insert into tt1 values(1); >>> declare c50 cursor for select * from tt1; >>> insert into tt1 values(2); >>> fetch all from c50; >>> COMMIT; >>> truncate table tt1; >>> >>> BEGIN; >>> >>> declare c50 cursor for select * from tt1; >>> insert into tt1 values(1); >>> >>> insert into tt1 values(2); >>> fetch all from c50; >>> COMMIT; >>> truncate table tt1; >>> >>> >>> BEGIN; >>> insert into tt1 values(1); >>> insert into tt1 values(2); >>> >>> declare c50 cursor for select * from tt1; >>> insert into tt1 values(3); >>> >>> fetch all from c50; >>> COMMIT; >>> truncate table tt1; >>> >>> >>> BEGIN; >>> insert into tt1 values(1); >>> declare c50 cursor for select * from tt1; >>> insert into tt1 values(2); >>> declare c51 cursor for select * from tt1; >>> insert into tt1 values(3); >>> fetch all from c50; >>> fetch all from c51; >>> COMMIT; >>> truncate table tt1; >>> >>> >>> BEGIN; >>> insert into tt1 values(1); >>> declare c50 cursor for select * from tt1; >>> declare c51 cursor for select * from tt1; >>> insert into tt1 values(2); >>> insert into tt1 values(3); >>> fetch all from c50; >>> fetch all from c51; >>> COMMIT; >>> truncate table tt1; >>> >>> >>> On Fri, Jun 15, 2012 at 8:07 AM, Abbas Butt <abb...@en... >>> > wrote: >>> >>>> Hi, >>>> >>>> In a multi-statement transaction each statement is given a command >>>> identifier >>>> starting from zero and incrementing for each statement. >>>> These command indentifers are required for extra tracking because each >>>> statement has its own visibility rules with in the transaction. >>>> For example, a cursor’s contents must remain unchanged even if later >>>> statements in the >>>> same transaction modify rows. Such tracking is implemented using >>>> system command id >>>> columns cmin/cmax, which is internally actually is a single column. >>>> >>>> cmin/cmax come into play in case of multi-statement transactions only, >>>> they are both zero otherwise. >>>> >>>> cmin "The command identifier of the statement within the inserting >>>> transaction." >>>> cmax "The command identifier of the statement within the deleting >>>> transaction." >>>> >>>> Here are the visibility rules (taken from comments of tqual.c) >>>> >>>> ( // A heap tuple is valid "now" >>>> iff >>>> Xmin == my-transaction && // inserted by the current >>>> transaction >>>> Cmin < my-command && // before this command, and >>>> ( >>>> Xmax is null || // the row has not been >>>> deleted, or >>>> ( >>>> Xmax == my-transaction && // it was deleted by the >>>> current transaction >>>> Cmax >= my-command // but not before this command, >>>> ) >>>> ) >>>> ) >>>> || // or >>>> ( >>>> Xmin is committed && // the row was inserted by a >>>> committed transaction, and >>>> ( >>>> Xmax is null || // the row has not been >>>> deleted, or >>>> ( >>>> Xmax == my-transaction && // the row is being deleted by >>>> this transaction >>>> Cmax >= my-command) || // but it's not deleted "yet", >>>> or >>>> ( >>>> Xmax != my-transaction && // the row was deleted by >>>> another transaction >>>> Xmax is not committed // that has not been committed >>>> ) >>>> ) >>>> ) >>>> ) >>>> >>>> Because cmin and cmax are internally a single system column, >>>> it is therefore not possible to simply record the status of a row >>>> that is created and expired in the same multi-statement transaction. >>>> For that reason, a special combo command id is created that references >>>> a local memory hash that contains the actual cmin and cmax values. >>>> It means that if combo id is being used the number we are seeing >>>> would not be the cmin or cmax it will be an index into a local >>>> array that contains a structure with has the actual cmin and cmax >>>> values. >>>> >>>> The following queries (taken mostly from >>>> http://momjian.us/main/writings/pgsql/mvcc.pdf) >>>> use the contrib module pageinspect, which allows >>>> visibility of internal heap page structures and all stored rows, >>>> including those not visible in the current snapshot. >>>> (Bit 0x0020 is defined as HEAP_COMBOCID.) >>>> >>>> We are exploring 3 examples here: >>>> 1) INSERT & DELETE in a single transaction >>>> 2) INSERT & UPDATE in a single transaction >>>> 3) INSERT from two different transactions & UPDATE from one >>>> >>>> test=# drop table mvcc_demo; >>>> DROP TABLE >>>> test=# >>>> test=# create table mvcc_demo (val int); >>>> CREATE TABLE >>>> test=# >>>> test=# TRUNCATE mvcc_demo; >>>> TRUNCATE TABLE >>>> test=# >>>> test=# BEGIN; >>>> BEGIN >>>> test=# DELETE FROM mvcc_demo; -- increment command id to show that >>>> combo id would be different >>>> DELETE 0 >>>> test=# DELETE FROM mvcc_demo; >>>> DELETE 0 >>>> test=# DELETE FROM mvcc_demo; >>>> DELETE 0 >>>> test=# INSERT INTO mvcc_demo VALUES (1); >>>> INSERT 0 1 >>>> test=# INSERT INTO mvcc_demo VALUES (2); >>>> INSERT 0 1 >>>> test=# INSERT INTO mvcc_demo VALUES (3); >>>> INSERT 0 1 >>>> test=# SELECT t_xmin AS xmin, >>>> test-# t_xmax::text::int8 AS xmax, >>>> test-# t_field3::text::int8 AS cmin_cmax, >>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>> is_combocid >>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>> test-# ORDER BY 2 DESC, 3; >>>> xmin | xmax | cmin_cmax | is_combocid >>>> -------+------+-----------+------------- >>>> 80685 | 0 | 3 | f >>>> 80685 | 0 | 4 | f >>>> 80685 | 0 | 5 | f >>>> (3 rows) >>>> >>>> test=# >>>> test=# DELETE FROM mvcc_demo; >>>> DELETE 3 >>>> test=# SELECT t_xmin AS xmin, >>>> test-# t_xmax::text::int8 AS xmax, >>>> test-# t_field3::text::int8 AS cmin_cmax, >>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>> is_combocid >>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>> test-# ORDER BY 2 DESC, 3; >>>> xmin | xmax | cmin_cmax | is_combocid >>>> -------+-------+-----------+------------- >>>> 80685 | 80685 | 0 | t >>>> 80685 | 80685 | 1 | t >>>> 80685 | 80685 | 2 | t >>>> (3 rows) >>>> >>>> Note that since is_combocid is true the numbers are not cmin/cmax they >>>> are actually >>>> the indexes of the internal array already explained above. >>>> combo id index 0 would contain cmin 3, cmax 6 >>>> combo id index 1 would contain cmin 4, cmax 6 >>>> combo id index 2 would contain cmin 5, cmax 6 >>>> >>>> test=# >>>> test=# END; >>>> COMMIT >>>> test=# >>>> test=# >>>> test=# TRUNCATE mvcc_demo; >>>> TRUNCATE TABLE >>>> test=# >>>> test=# >>>> test=# >>>> test=# BEGIN; >>>> BEGIN >>>> test=# INSERT INTO mvcc_demo VALUES (1); >>>> INSERT 0 1 >>>> test=# INSERT INTO mvcc_demo VALUES (2); >>>> INSERT 0 1 >>>> test=# INSERT INTO mvcc_demo VALUES (3); >>>> INSERT 0 1 >>>> test=# SELECT t_xmin AS xmin, >>>> test-# t_xmax::text::int8 AS xmax, >>>> test-# t_field3::text::int8 AS cmin_cmax, >>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>> is_combocid >>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>> test-# ORDER BY 2 DESC, 3; >>>> xmin | xmax | cmin_cmax | is_combocid >>>> -------+------+-----------+------------- >>>> 80675 | 0 | 0 | f >>>> 80675 | 0 | 1 | f >>>> 80675 | 0 | 2 | f >>>> (3 rows) >>>> >>>> test=# >>>> test=# UPDATE mvcc_demo SET val = val * 10; >>>> UPDATE 3 >>>> test=# >>>> test=# SELECT t_xmin AS xmin, >>>> test-# t_xmax::text::int8 AS xmax, >>>> test-# t_field3::text::int8 AS cmin_cmax, >>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>> is_combocid >>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>> test-# ORDER BY 2 DESC, 3; >>>> xmin | xmax | cmin_cmax | is_combocid >>>> -------+-------+-----------+------------- >>>> 80675 | 80675 | 0 | t >>>> 80675 | 80675 | 1 | t >>>> 80675 | 80675 | 2 | t >>>> 80675 | 0 | 3 | f >>>> 80675 | 0 | 3 | f >>>> 80675 | 0 | 3 | f >>>> (6 rows) >>>> >>>> test=# >>>> test=# END; >>>> COMMIT >>>> test=# >>>> test=# >>>> test=# TRUNCATE mvcc_demo; >>>> TRUNCATE TABLE >>>> test=# >>>> >>>> -- From one psql issue >>>> test=# INSERT INTO mvcc_demo VALUES (1); >>>> INSERT 0 1 >>>> test=# SELECT t_xmin AS xmin, >>>> test-# t_xmax::text::int8 AS xmax, >>>> test-# t_field3::text::int8 AS cmin_cmax, >>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>> is_combocid >>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>> test-# ORDER BY 2 DESC, 3; >>>> xmin | xmax | cmin_cmax | is_combocid >>>> -------+------+-----------+------------- >>>> 80677 | 0 | 0 | f >>>> (1 row) >>>> >>>> >>>> test=# -- From another issue >>>> test=# BEGIN; >>>> BEGIN >>>> test=# INSERT INTO mvcc_demo VALUES (2); >>>> INSERT 0 1 >>>> test=# INSERT INTO mvcc_demo VALUES (3); >>>> INSERT 0 1 >>>> test=# INSERT INTO mvcc_demo VALUES (4); >>>> INSERT 0 1 >>>> test=# SELECT t_xmin AS xmin, >>>> test-# t_xmax::text::int8 AS xmax, >>>> test-# t_field3::text::int8 AS cmin_cmax, >>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>> is_combocid >>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>> test-# ORDER BY 2 DESC, 3; >>>> xmin | xmax | cmin_cmax | is_combocid >>>> -------+------+-----------+------------- >>>> 80677 | 0 | 0 | f >>>> 80678 | 0 | 0 | f >>>> 80678 | 0 | 1 | f >>>> 80678 | 0 | 2 | f >>>> (4 rows) >>>> >>>> test=# >>>> test=# UPDATE mvcc_demo SET val = val * 10; >>>> UPDATE 4 >>>> test=# SELECT t_xmin AS xmin, >>>> test-# t_xmax::text::int8 AS xmax, >>>> test-# t_field3::text::int8 AS cmin_cmax, >>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>> is_combocid >>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>> test-# ORDER BY 2 DESC, 3; >>>> xmin | xmax | cmin_cmax | is_combocid >>>> -------+-------+-----------+------------- >>>> 80678 | 80678 | 0 | t >>>> 80678 | 80678 | 1 | t >>>> 80678 | 80678 | 2 | t >>>> 80677 | 80678 | 3 | f >>>> 80678 | 0 | 3 | f >>>> 80678 | 0 | 3 | f >>>> 80678 | 0 | 3 | f >>>> 80678 | 0 | 3 | f >>>> (8 rows) >>>> >>>> test=# >>>> >>>> test=# -- Before finishing this, issue these from the first psql >>>> test=# SELECT t_xmin AS xmin, >>>> test-# t_xmax::text::int8 AS xmax, >>>> test-# t_field3::text::int8 AS cmin_cmax, >>>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>>> is_combocid >>>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>>> test-# ORDER BY 2 DESC, 3; >>>> xmin | xmax | cmin_cmax | is_combocid >>>> -------+-------+-----------+------------- >>>> 80678 | 80678 | 0 | t >>>> 80678 | 80678 | 1 | t >>>> 80678 | 80678 | 2 | t >>>> 80677 | 80678 | 3 | f >>>> 80678 | 0 | 3 | f >>>> 80678 | 0 | 3 | f >>>> 80678 | 0 | 3 | f >>>> 80678 | 0 | 3 | f >>>> (8 rows) >>>> >>>> test=# END; >>>> COMMIT >>>> >>>> >>>> Now consider the case we are trying to solve >>>> >>>> drop table tt1; >>>> create table tt1(f1 int); >>>> >>>> BEGIN; >>>> insert into tt1 values(1); >>>> declare c50 cursor for select * from tt1; -- should show one row only >>>> insert into tt1 values(2); >>>> fetch all from c50; >>>> COMMIT; >>>> >>>> >>>> Consider Data node 1 log >>>> >>>> (a) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read >>>> committed READ WRITE] >>>> (b) [exec_simple_query][1026][drop table tt1;] >>>> (c) [exec_simple_query][1026][PREPARE TRANSACTION 'T21075'] >>>> (d) [exec_simple_query][1026][COMMIT PREPARED 'T21075'] >>>> (e) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read >>>> committed READ WRITE] >>>> (f) [exec_simple_query][1026][create table tt1(f1 int);] >>>> (g) [exec_simple_query][1026][PREPARE TRANSACTION 'T21077'] >>>> (h) [exec_simple_query][1026][COMMIT PREPARED 'T21077'] >>>> (i) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read >>>> committed READ WRITE] >>>> (j) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (1)] >>>> (k) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (2)] >>>> (l) [PostgresMain][4155][SELECT tt1.f1, tt1.ctid, pgxc_node_str() FROM >>>> tt1] >>>> (m) [exec_simple_query][1026][COMMIT TRANSACTION] >>>> >>>> The cursor currently shows both inserted rows because command id at >>>> data node in >>>> step (j) is 0 >>>> step (k) is 1 & >>>> step (l) is 2 >>>> >>>> Where as we need command ids to be >>>> >>>> step (j) should be 0 >>>> step (k) should be 2 & >>>> step (l) should be 1 >>>> >>>> This will solve the cursor visibility problem. >>>> >>>> To implement this I suggest we send command IDs to data nodes from the >>>> coordinator >>>> like we send gxid. The only difference will be that we do not need to >>>> take command IDs >>>> from GTM since they are only valid with in the transaction. >>>> >>>> See this example >>>> >>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>> xmin | xmax | cmin | cmax | f1 >>>> ------+------+------+------+---- >>>> (0 rows) >>>> >>>> test=# begin; >>>> BEGIN >>>> test=# insert into tt1 values(1); >>>> INSERT 0 1 >>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>> xmin | xmax | cmin | cmax | f1 >>>> -------+------+------+------+---- >>>> 80615 | 0 | 0 | 0 | 1 >>>> (1 row) >>>> >>>> test=# insert into tt1 values(2); >>>> INSERT 0 1 >>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>> xmin | xmax | cmin | cmax | f1 >>>> -------+------+------+------+---- >>>> 80615 | 0 | 0 | 0 | 1 >>>> 80615 | 0 | 1 | 1 | 2 >>>> (2 rows) >>>> >>>> test=# insert into tt1 values(3); >>>> INSERT 0 1 >>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>> xmin | xmax | cmin | cmax | f1 >>>> -------+------+------+------+---- >>>> 80615 | 0 | 0 | 0 | 1 >>>> 80615 | 0 | 1 | 1 | 2 >>>> 80615 | 0 | 2 | 2 | 3 >>>> (3 rows) >>>> >>>> test=# insert into tt1 values(4); >>>> INSERT 0 1 >>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>> xmin | xmax | cmin | cmax | f1 >>>> -------+------+------+------+---- >>>> 80615 | 0 | 0 | 0 | 1 >>>> 80615 | 0 | 1 | 1 | 2 >>>> 80615 | 0 | 2 | 2 | 3 >>>> 80615 | 0 | 3 | 3 | 4 >>>> (4 rows) >>>> >>>> test=# end; >>>> COMMIT >>>> test=# >>>> test=# >>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>> xmin | xmax | cmin | cmax | f1 >>>> -------+------+------+------+---- >>>> 80615 | 0 | 0 | 0 | 1 >>>> 80615 | 0 | 1 | 1 | 2 >>>> 80615 | 0 | 2 | 2 | 3 >>>> 80615 | 0 | 3 | 3 | 4 >>>> (4 rows) >>>> >>>> test=# insert into tt1 values(5); >>>> INSERT 0 1 >>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>> xmin | xmax | cmin | cmax | f1 >>>> -------+------+------+------+---- >>>> 80615 | 0 | 0 | 0 | 1 >>>> 80615 | 0 | 1 | 1 | 2 >>>> 80615 | 0 | 2 | 2 | 3 >>>> 80615 | 0 | 3 | 3 | 4 >>>> 80616 | 0 | 0 | 0 | 5 >>>> (5 rows) >>>> >>>> test=# insert into tt1 values(6); >>>> INSERT 0 1 >>>> test=# >>>> test=# >>>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>>> xmin | xmax | cmin | cmax | f1 >>>> -------+------+------+------+---- >>>> 80615 | 0 | 0 | 0 | 1 >>>> 80615 | 0 | 1 | 1 | 2 >>>> 80615 | 0 | 2 | 2 | 3 >>>> 80615 | 0 | 3 | 3 | 4 >>>> 80616 | 0 | 0 | 0 | 5 >>>> 80617 | 0 | 0 | 0 | 6 >>>> (6 rows) >>>> >>>> Note that at the end of the multi-statement transaction the command id >>>> gets reset to zero. >>>> >>>> -- >>>> Abbas >>>> Architect >>>> EnterpriseDB Corporation >>>> The Enterprise PostgreSQL Company >>> >>> >>> >>> >>> -- >>> -- >>> Abbas >>> Architect >>> EnterpriseDB Corporation >>> The Enterprise PostgreSQL Company >>> >>> Phone: 92-334-5100153 >>> >>> Website: www.enterprisedb.com >>> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >>> Follow us on Twitter: http://www.twitter.com/enterprisedb >>> >>> This e-mail message (and any attachment) is intended for the use of >>> the individual or entity to whom it is addressed. This message >>> contains information from EnterpriseDB Corporation that may be >>> privileged, confidential, or exempt from disclosure under applicable >>> law. If you are not the intended recipient or authorized to receive >>> this for the intended recipient, any use, dissemination, distribution, >>> retention, archiving, or copying of this communication is strictly >>> prohibited. If you have received this e-mail in error, please notify >>> the sender immediately by reply e-mail and delete this message. >>> >>> >>> ------------------------------------------------------------------------------ >>> Live Security Virtual Conference >>> Exclusive live event will cover all the ways today's security and >>> threat landscape has changed and how IT managers can respond. Discussions >>> will include endpoint security, mobile security and the latest in malware >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> _______________________________________________ >>> Postgres-xc-developers mailing list >>> Pos...@li... >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>> >>> >> >> >> -- >> Michael Paquier >> http://michael.otacoo.com >> > > > > -- > -- > Abbas > Architect > EnterpriseDB Corporation > The Enterprise PostgreSQL Company > > Phone: 92-334-5100153 > > Website: www.enterprisedb.com > EnterpriseDB Blog: http://blogs.enterprisedb.com/ > Follow us on Twitter: http://www.twitter.com/enterprisedb > > This e-mail message (and any attachment) is intended for the use of > the individual or entity to whom it is addressed. This message > contains information from EnterpriseDB Corporation that may be > privileged, confidential, or exempt from disclosure under applicable > law. If you are not the intended recipient or authorized to receive > this for the intended recipient, any use, dissemination, distribution, > retention, archiving, or copying of this communication is strictly > prohibited. If you have received this e-mail in error, please notify > the sender immediately by reply e-mail and delete this message. > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Abbas B. <abb...@en...> - 2012-06-19 08:26:16
|
Hi Ashutosh, Here are the results with the val column, Thanks. test=# drop table mvcc_demo; DROP TABLE test=# test=# create table mvcc_demo (val int); CREATE TABLE test=# test=# TRUNCATE mvcc_demo; TRUNCATE TABLE test=# test=# BEGIN; BEGIN test=# DELETE FROM mvcc_demo; -- increment command id to show that combo id would be different DELETE 0 test=# DELETE FROM mvcc_demo; DELETE 0 test=# DELETE FROM mvcc_demo; DELETE 0 test=# INSERT INTO mvcc_demo VALUES (1); INSERT 0 1 test=# INSERT INTO mvcc_demo VALUES (2); INSERT 0 1 test=# INSERT INTO mvcc_demo VALUES (3); INSERT 0 1 test=# SELECT t_xmin AS xmin, test-# t_xmax::text::int8 AS xmax, test-# t_field3::text::int8 AS cmin_cmax, test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) test-# ORDER BY 2 DESC, 3; xmin | xmax | cmin_cmax | is_combocid -------+------+-----------+------------- 80689 | 0 | 3 | f 80689 | 0 | 4 | f 80689 | 0 | 5 | f (3 rows) test=# test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; xmin | xmax | cmin | cmax | val -------+------+------+------+----- 80689 | 0 | 3 | 3 | 1 80689 | 0 | 4 | 4 | 2 80689 | 0 | 5 | 5 | 3 (3 rows) test=# test=# DELETE FROM mvcc_demo; DELETE 3 test=# SELECT t_xmin AS xmin, test-# t_xmax::text::int8 AS xmax, test-# t_field3::text::int8 AS cmin_cmax, test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) test-# ORDER BY 2 DESC, 3; xmin | xmax | cmin_cmax | is_combocid -------+-------+-----------+------------- 80689 | 80689 | 0 | t 80689 | 80689 | 1 | t 80689 | 80689 | 2 | t (3 rows) test=# test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; xmin | xmax | cmin | cmax | val ------+------+------+------+----- (0 rows) test=# test=# END; COMMIT test=# test=# test=# TRUNCATE mvcc_demo; TRUNCATE TABLE test=# BEGIN; BEGIN test=# INSERT INTO mvcc_demo VALUES (1); INSERT 0 1 test=# INSERT INTO mvcc_demo VALUES (2); INSERT 0 1 test=# INSERT INTO mvcc_demo VALUES (3); INSERT 0 1 test=# SELECT t_xmin AS xmin, test-# t_xmax::text::int8 AS xmax, test-# t_field3::text::int8 AS cmin_cmax, test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) test-# ORDER BY 2 DESC, 3; xmin | xmax | cmin_cmax | is_combocid -------+------+-----------+------------- 80693 | 0 | 0 | f 80693 | 0 | 1 | f 80693 | 0 | 2 | f (3 rows) test=# test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; xmin | xmax | cmin | cmax | val -------+------+------+------+----- 80693 | 0 | 0 | 0 | 1 80693 | 0 | 1 | 1 | 2 80693 | 0 | 2 | 2 | 3 (3 rows) test=# test=# UPDATE mvcc_demo SET val = 10; UPDATE 3 test=# test=# SELECT t_xmin AS xmin, test-# t_xmax::text::int8 AS xmax, test-# t_field3::text::int8 AS cmin_cmax, test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) test-# ORDER BY 2 DESC, 3; xmin | xmax | cmin_cmax | is_combocid -------+-------+-----------+------------- 80693 | 80693 | 0 | t 80693 | 80693 | 1 | t 80693 | 80693 | 2 | t 80693 | 0 | 3 | f 80693 | 0 | 3 | f 80693 | 0 | 3 | f (6 rows) test=# test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; xmin | xmax | cmin | cmax | val -------+------+------+------+----- 80693 | 0 | 3 | 3 | 10 80693 | 0 | 3 | 3 | 10 80693 | 0 | 3 | 3 | 10 (3 rows) test=# test=# END; COMMIT test=# test=# TRUNCATE mvcc_demo; TRUNCATE TABLE -- From one psql issue test=# INSERT INTO mvcc_demo VALUES (1); INSERT 0 1 test=# SELECT t_xmin AS xmin, test-# t_xmax::text::int8 AS xmax, test-# t_field3::text::int8 AS cmin_cmax, test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) test-# ORDER BY 2 DESC, 3; xmin | xmax | cmin_cmax | is_combocid -------+------+-----------+------------- 80699 | 0 | 0 | f (1 row) test=# test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; xmin | xmax | cmin | cmax | val -------+------+------+------+----- 80699 | 0 | 0 | 0 | 1 (1 row) test=# -- From another issue test=# BEGIN; BEGIN test=# INSERT INTO mvcc_demo VALUES (2); INSERT 0 1 test=# INSERT INTO mvcc_demo VALUES (3); INSERT 0 1 test=# INSERT INTO mvcc_demo VALUES (4); INSERT 0 1 test=# SELECT t_xmin AS xmin, test-# t_xmax::text::int8 AS xmax, test-# t_field3::text::int8 AS cmin_cmax, test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) test-# ORDER BY 2 DESC, 3; xmin | xmax | cmin_cmax | is_combocid -------+------+-----------+------------- 80699 | 0 | 0 | f 80700 | 0 | 0 | f 80700 | 0 | 1 | f 80700 | 0 | 2 | f (4 rows) test=# test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; xmin | xmax | cmin | cmax | val -------+------+------+------+----- 80699 | 0 | 0 | 0 | 1 80700 | 0 | 0 | 0 | 2 80700 | 0 | 1 | 1 | 3 80700 | 0 | 2 | 2 | 4 (4 rows) test=# test=# UPDATE mvcc_demo SET val = 10; UPDATE 4 test=# test=# SELECT t_xmin AS xmin, test-# t_xmax::text::int8 AS xmax, test-# t_field3::text::int8 AS cmin_cmax, test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) test-# ORDER BY 2 DESC, 3; xmin | xmax | cmin_cmax | is_combocid -------+-------+-----------+------------- 80700 | 80700 | 0 | t 80700 | 80700 | 1 | t 80700 | 80700 | 2 | t 80699 | 80700 | 3 | f 80700 | 0 | 3 | f 80700 | 0 | 3 | f 80700 | 0 | 3 | f 80700 | 0 | 3 | f (8 rows) test=# test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; xmin | xmax | cmin | cmax | val -------+------+------+------+----- 80700 | 0 | 3 | 3 | 10 80700 | 0 | 3 | 3 | 10 80700 | 0 | 3 | 3 | 10 80700 | 0 | 3 | 3 | 10 (4 rows) test=# -- Before finishing this, issue these from the first psql test=# SELECT t_xmin AS xmin, test-# t_xmax::text::int8 AS xmax, test-# t_field3::text::int8 AS cmin_cmax, test-# (t_infomask::integer & X'0020'::integer)::bool AS is_combocid test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) test-# ORDER BY 2 DESC, 3; xmin | xmax | cmin_cmax | is_combocid -------+-------+-----------+------------- 80700 | 80700 | 0 | t 80700 | 80700 | 1 | t 80700 | 80700 | 2 | t 80699 | 80700 | 3 | f 80700 | 0 | 3 | f 80700 | 0 | 3 | f 80700 | 0 | 3 | f 80700 | 0 | 3 | f (8 rows) test=# test=# select xmin,xmax,cmin,cmax,* from mvcc_demo order by val; xmin | xmax | cmin | cmax | val -------+-------+------+------+----- 80699 | 80700 | 3 | 3 | 1 (1 row) test=# end; COMMIT On Tue, Jun 19, 2012 at 10:26 AM, Michael Paquier <mic...@gm... > wrote: > Hi, > > I expect pgxc_node_send_cmd_id to have some impact on performance, so be > sure to send it to remote Datanodes really only if necessary. > You should put more severe conditions blocking this function cid can > easily get incremented in Postgres. > > Regards, > > On Tue, Jun 19, 2012 at 5:31 AM, Abbas Butt <abb...@en...>wrote: > >> PFA a WIP patch implementing the design presented earlier. >> The patch is WIP because it still has and FIXME and it shows some >> regression failures that need to be fixed, but other than that it confirms >> that the suggested design would work fine. The following test cases now >> work fine >> >> drop table tt1; >> create table tt1(f1 int) distribute by replication; >> >> >> BEGIN; >> insert into tt1 values(1); >> declare c50 cursor for select * from tt1; >> insert into tt1 values(2); >> fetch all from c50; >> COMMIT; >> truncate table tt1; >> >> BEGIN; >> >> declare c50 cursor for select * from tt1; >> insert into tt1 values(1); >> >> insert into tt1 values(2); >> fetch all from c50; >> COMMIT; >> truncate table tt1; >> >> >> BEGIN; >> insert into tt1 values(1); >> insert into tt1 values(2); >> >> declare c50 cursor for select * from tt1; >> insert into tt1 values(3); >> >> fetch all from c50; >> COMMIT; >> truncate table tt1; >> >> >> BEGIN; >> insert into tt1 values(1); >> declare c50 cursor for select * from tt1; >> insert into tt1 values(2); >> declare c51 cursor for select * from tt1; >> insert into tt1 values(3); >> fetch all from c50; >> fetch all from c51; >> COMMIT; >> truncate table tt1; >> >> >> BEGIN; >> insert into tt1 values(1); >> declare c50 cursor for select * from tt1; >> declare c51 cursor for select * from tt1; >> insert into tt1 values(2); >> insert into tt1 values(3); >> fetch all from c50; >> fetch all from c51; >> COMMIT; >> truncate table tt1; >> >> >> On Fri, Jun 15, 2012 at 8:07 AM, Abbas Butt <abb...@en...>wrote: >> >>> Hi, >>> >>> In a multi-statement transaction each statement is given a command >>> identifier >>> starting from zero and incrementing for each statement. >>> These command indentifers are required for extra tracking because each >>> statement has its own visibility rules with in the transaction. >>> For example, a cursor’s contents must remain unchanged even if later >>> statements in the >>> same transaction modify rows. Such tracking is implemented using system >>> command id >>> columns cmin/cmax, which is internally actually is a single column. >>> >>> cmin/cmax come into play in case of multi-statement transactions only, >>> they are both zero otherwise. >>> >>> cmin "The command identifier of the statement within the inserting >>> transaction." >>> cmax "The command identifier of the statement within the deleting >>> transaction." >>> >>> Here are the visibility rules (taken from comments of tqual.c) >>> >>> ( // A heap tuple is valid "now" >>> iff >>> Xmin == my-transaction && // inserted by the current >>> transaction >>> Cmin < my-command && // before this command, and >>> ( >>> Xmax is null || // the row has not been >>> deleted, or >>> ( >>> Xmax == my-transaction && // it was deleted by the >>> current transaction >>> Cmax >= my-command // but not before this command, >>> ) >>> ) >>> ) >>> || // or >>> ( >>> Xmin is committed && // the row was inserted by a >>> committed transaction, and >>> ( >>> Xmax is null || // the row has not been >>> deleted, or >>> ( >>> Xmax == my-transaction && // the row is being deleted by >>> this transaction >>> Cmax >= my-command) || // but it's not deleted "yet", >>> or >>> ( >>> Xmax != my-transaction && // the row was deleted by >>> another transaction >>> Xmax is not committed // that has not been committed >>> ) >>> ) >>> ) >>> ) >>> >>> Because cmin and cmax are internally a single system column, >>> it is therefore not possible to simply record the status of a row >>> that is created and expired in the same multi-statement transaction. >>> For that reason, a special combo command id is created that references >>> a local memory hash that contains the actual cmin and cmax values. >>> It means that if combo id is being used the number we are seeing >>> would not be the cmin or cmax it will be an index into a local >>> array that contains a structure with has the actual cmin and cmax values. >>> >>> The following queries (taken mostly from >>> http://momjian.us/main/writings/pgsql/mvcc.pdf) >>> use the contrib module pageinspect, which allows >>> visibility of internal heap page structures and all stored rows, >>> including those not visible in the current snapshot. >>> (Bit 0x0020 is defined as HEAP_COMBOCID.) >>> >>> We are exploring 3 examples here: >>> 1) INSERT & DELETE in a single transaction >>> 2) INSERT & UPDATE in a single transaction >>> 3) INSERT from two different transactions & UPDATE from one >>> >>> test=# drop table mvcc_demo; >>> DROP TABLE >>> test=# >>> test=# create table mvcc_demo (val int); >>> CREATE TABLE >>> test=# >>> test=# TRUNCATE mvcc_demo; >>> TRUNCATE TABLE >>> test=# >>> test=# BEGIN; >>> BEGIN >>> test=# DELETE FROM mvcc_demo; -- increment command id to show that combo >>> id would be different >>> DELETE 0 >>> test=# DELETE FROM mvcc_demo; >>> DELETE 0 >>> test=# DELETE FROM mvcc_demo; >>> DELETE 0 >>> test=# INSERT INTO mvcc_demo VALUES (1); >>> INSERT 0 1 >>> test=# INSERT INTO mvcc_demo VALUES (2); >>> INSERT 0 1 >>> test=# INSERT INTO mvcc_demo VALUES (3); >>> INSERT 0 1 >>> test=# SELECT t_xmin AS xmin, >>> test-# t_xmax::text::int8 AS xmax, >>> test-# t_field3::text::int8 AS cmin_cmax, >>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> is_combocid >>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> test-# ORDER BY 2 DESC, 3; >>> xmin | xmax | cmin_cmax | is_combocid >>> -------+------+-----------+------------- >>> 80685 | 0 | 3 | f >>> 80685 | 0 | 4 | f >>> 80685 | 0 | 5 | f >>> (3 rows) >>> >>> test=# >>> test=# DELETE FROM mvcc_demo; >>> DELETE 3 >>> test=# SELECT t_xmin AS xmin, >>> test-# t_xmax::text::int8 AS xmax, >>> test-# t_field3::text::int8 AS cmin_cmax, >>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> is_combocid >>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> test-# ORDER BY 2 DESC, 3; >>> xmin | xmax | cmin_cmax | is_combocid >>> -------+-------+-----------+------------- >>> 80685 | 80685 | 0 | t >>> 80685 | 80685 | 1 | t >>> 80685 | 80685 | 2 | t >>> (3 rows) >>> >>> Note that since is_combocid is true the numbers are not cmin/cmax they >>> are actually >>> the indexes of the internal array already explained above. >>> combo id index 0 would contain cmin 3, cmax 6 >>> combo id index 1 would contain cmin 4, cmax 6 >>> combo id index 2 would contain cmin 5, cmax 6 >>> >>> test=# >>> test=# END; >>> COMMIT >>> test=# >>> test=# >>> test=# TRUNCATE mvcc_demo; >>> TRUNCATE TABLE >>> test=# >>> test=# >>> test=# >>> test=# BEGIN; >>> BEGIN >>> test=# INSERT INTO mvcc_demo VALUES (1); >>> INSERT 0 1 >>> test=# INSERT INTO mvcc_demo VALUES (2); >>> INSERT 0 1 >>> test=# INSERT INTO mvcc_demo VALUES (3); >>> INSERT 0 1 >>> test=# SELECT t_xmin AS xmin, >>> test-# t_xmax::text::int8 AS xmax, >>> test-# t_field3::text::int8 AS cmin_cmax, >>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> is_combocid >>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> test-# ORDER BY 2 DESC, 3; >>> xmin | xmax | cmin_cmax | is_combocid >>> -------+------+-----------+------------- >>> 80675 | 0 | 0 | f >>> 80675 | 0 | 1 | f >>> 80675 | 0 | 2 | f >>> (3 rows) >>> >>> test=# >>> test=# UPDATE mvcc_demo SET val = val * 10; >>> UPDATE 3 >>> test=# >>> test=# SELECT t_xmin AS xmin, >>> test-# t_xmax::text::int8 AS xmax, >>> test-# t_field3::text::int8 AS cmin_cmax, >>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> is_combocid >>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> test-# ORDER BY 2 DESC, 3; >>> xmin | xmax | cmin_cmax | is_combocid >>> -------+-------+-----------+------------- >>> 80675 | 80675 | 0 | t >>> 80675 | 80675 | 1 | t >>> 80675 | 80675 | 2 | t >>> 80675 | 0 | 3 | f >>> 80675 | 0 | 3 | f >>> 80675 | 0 | 3 | f >>> (6 rows) >>> >>> test=# >>> test=# END; >>> COMMIT >>> test=# >>> test=# >>> test=# TRUNCATE mvcc_demo; >>> TRUNCATE TABLE >>> test=# >>> >>> -- From one psql issue >>> test=# INSERT INTO mvcc_demo VALUES (1); >>> INSERT 0 1 >>> test=# SELECT t_xmin AS xmin, >>> test-# t_xmax::text::int8 AS xmax, >>> test-# t_field3::text::int8 AS cmin_cmax, >>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> is_combocid >>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> test-# ORDER BY 2 DESC, 3; >>> xmin | xmax | cmin_cmax | is_combocid >>> -------+------+-----------+------------- >>> 80677 | 0 | 0 | f >>> (1 row) >>> >>> >>> test=# -- From another issue >>> test=# BEGIN; >>> BEGIN >>> test=# INSERT INTO mvcc_demo VALUES (2); >>> INSERT 0 1 >>> test=# INSERT INTO mvcc_demo VALUES (3); >>> INSERT 0 1 >>> test=# INSERT INTO mvcc_demo VALUES (4); >>> INSERT 0 1 >>> test=# SELECT t_xmin AS xmin, >>> test-# t_xmax::text::int8 AS xmax, >>> test-# t_field3::text::int8 AS cmin_cmax, >>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> is_combocid >>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> test-# ORDER BY 2 DESC, 3; >>> xmin | xmax | cmin_cmax | is_combocid >>> -------+------+-----------+------------- >>> 80677 | 0 | 0 | f >>> 80678 | 0 | 0 | f >>> 80678 | 0 | 1 | f >>> 80678 | 0 | 2 | f >>> (4 rows) >>> >>> test=# >>> test=# UPDATE mvcc_demo SET val = val * 10; >>> UPDATE 4 >>> test=# SELECT t_xmin AS xmin, >>> test-# t_xmax::text::int8 AS xmax, >>> test-# t_field3::text::int8 AS cmin_cmax, >>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> is_combocid >>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> test-# ORDER BY 2 DESC, 3; >>> xmin | xmax | cmin_cmax | is_combocid >>> -------+-------+-----------+------------- >>> 80678 | 80678 | 0 | t >>> 80678 | 80678 | 1 | t >>> 80678 | 80678 | 2 | t >>> 80677 | 80678 | 3 | f >>> 80678 | 0 | 3 | f >>> 80678 | 0 | 3 | f >>> 80678 | 0 | 3 | f >>> 80678 | 0 | 3 | f >>> (8 rows) >>> >>> test=# >>> >>> test=# -- Before finishing this, issue these from the first psql >>> test=# SELECT t_xmin AS xmin, >>> test-# t_xmax::text::int8 AS xmax, >>> test-# t_field3::text::int8 AS cmin_cmax, >>> test-# (t_infomask::integer & X'0020'::integer)::bool AS >>> is_combocid >>> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >>> test-# ORDER BY 2 DESC, 3; >>> xmin | xmax | cmin_cmax | is_combocid >>> -------+-------+-----------+------------- >>> 80678 | 80678 | 0 | t >>> 80678 | 80678 | 1 | t >>> 80678 | 80678 | 2 | t >>> 80677 | 80678 | 3 | f >>> 80678 | 0 | 3 | f >>> 80678 | 0 | 3 | f >>> 80678 | 0 | 3 | f >>> 80678 | 0 | 3 | f >>> (8 rows) >>> >>> test=# END; >>> COMMIT >>> >>> >>> Now consider the case we are trying to solve >>> >>> drop table tt1; >>> create table tt1(f1 int); >>> >>> BEGIN; >>> insert into tt1 values(1); >>> declare c50 cursor for select * from tt1; -- should show one row only >>> insert into tt1 values(2); >>> fetch all from c50; >>> COMMIT; >>> >>> >>> Consider Data node 1 log >>> >>> (a) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read >>> committed READ WRITE] >>> (b) [exec_simple_query][1026][drop table tt1;] >>> (c) [exec_simple_query][1026][PREPARE TRANSACTION 'T21075'] >>> (d) [exec_simple_query][1026][COMMIT PREPARED 'T21075'] >>> (e) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read >>> committed READ WRITE] >>> (f) [exec_simple_query][1026][create table tt1(f1 int);] >>> (g) [exec_simple_query][1026][PREPARE TRANSACTION 'T21077'] >>> (h) [exec_simple_query][1026][COMMIT PREPARED 'T21077'] >>> (i) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read >>> committed READ WRITE] >>> (j) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (1)] >>> (k) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (2)] >>> (l) [PostgresMain][4155][SELECT tt1.f1, tt1.ctid, pgxc_node_str() FROM >>> tt1] >>> (m) [exec_simple_query][1026][COMMIT TRANSACTION] >>> >>> The cursor currently shows both inserted rows because command id at data >>> node in >>> step (j) is 0 >>> step (k) is 1 & >>> step (l) is 2 >>> >>> Where as we need command ids to be >>> >>> step (j) should be 0 >>> step (k) should be 2 & >>> step (l) should be 1 >>> >>> This will solve the cursor visibility problem. >>> >>> To implement this I suggest we send command IDs to data nodes from the >>> coordinator >>> like we send gxid. The only difference will be that we do not need to >>> take command IDs >>> from GTM since they are only valid with in the transaction. >>> >>> See this example >>> >>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>> xmin | xmax | cmin | cmax | f1 >>> ------+------+------+------+---- >>> (0 rows) >>> >>> test=# begin; >>> BEGIN >>> test=# insert into tt1 values(1); >>> INSERT 0 1 >>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>> xmin | xmax | cmin | cmax | f1 >>> -------+------+------+------+---- >>> 80615 | 0 | 0 | 0 | 1 >>> (1 row) >>> >>> test=# insert into tt1 values(2); >>> INSERT 0 1 >>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>> xmin | xmax | cmin | cmax | f1 >>> -------+------+------+------+---- >>> 80615 | 0 | 0 | 0 | 1 >>> 80615 | 0 | 1 | 1 | 2 >>> (2 rows) >>> >>> test=# insert into tt1 values(3); >>> INSERT 0 1 >>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>> xmin | xmax | cmin | cmax | f1 >>> -------+------+------+------+---- >>> 80615 | 0 | 0 | 0 | 1 >>> 80615 | 0 | 1 | 1 | 2 >>> 80615 | 0 | 2 | 2 | 3 >>> (3 rows) >>> >>> test=# insert into tt1 values(4); >>> INSERT 0 1 >>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>> xmin | xmax | cmin | cmax | f1 >>> -------+------+------+------+---- >>> 80615 | 0 | 0 | 0 | 1 >>> 80615 | 0 | 1 | 1 | 2 >>> 80615 | 0 | 2 | 2 | 3 >>> 80615 | 0 | 3 | 3 | 4 >>> (4 rows) >>> >>> test=# end; >>> COMMIT >>> test=# >>> test=# >>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>> xmin | xmax | cmin | cmax | f1 >>> -------+------+------+------+---- >>> 80615 | 0 | 0 | 0 | 1 >>> 80615 | 0 | 1 | 1 | 2 >>> 80615 | 0 | 2 | 2 | 3 >>> 80615 | 0 | 3 | 3 | 4 >>> (4 rows) >>> >>> test=# insert into tt1 values(5); >>> INSERT 0 1 >>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>> xmin | xmax | cmin | cmax | f1 >>> -------+------+------+------+---- >>> 80615 | 0 | 0 | 0 | 1 >>> 80615 | 0 | 1 | 1 | 2 >>> 80615 | 0 | 2 | 2 | 3 >>> 80615 | 0 | 3 | 3 | 4 >>> 80616 | 0 | 0 | 0 | 5 >>> (5 rows) >>> >>> test=# insert into tt1 values(6); >>> INSERT 0 1 >>> test=# >>> test=# >>> test=# select xmin,xmax,cmin,cmax,* from tt1; >>> xmin | xmax | cmin | cmax | f1 >>> -------+------+------+------+---- >>> 80615 | 0 | 0 | 0 | 1 >>> 80615 | 0 | 1 | 1 | 2 >>> 80615 | 0 | 2 | 2 | 3 >>> 80615 | 0 | 3 | 3 | 4 >>> 80616 | 0 | 0 | 0 | 5 >>> 80617 | 0 | 0 | 0 | 6 >>> (6 rows) >>> >>> Note that at the end of the multi-statement transaction the command id >>> gets reset to zero. >>> >>> -- >>> Abbas >>> Architect >>> EnterpriseDB Corporation >>> The Enterprise PostgreSQL Company >> >> >> >> >> -- >> -- >> Abbas >> Architect >> EnterpriseDB Corporation >> The Enterprise PostgreSQL Company >> >> Phone: 92-334-5100153 >> >> Website: www.enterprisedb.com >> EnterpriseDB Blog: http://blogs.enterprisedb.com/ >> Follow us on Twitter: http://www.twitter.com/enterprisedb >> >> This e-mail message (and any attachment) is intended for the use of >> the individual or entity to whom it is addressed. This message >> contains information from EnterpriseDB Corporation that may be >> privileged, confidential, or exempt from disclosure under applicable >> law. If you are not the intended recipient or authorized to receive >> this for the intended recipient, any use, dissemination, distribution, >> retention, archiving, or copying of this communication is strictly >> prohibited. If you have received this e-mail in error, please notify >> the sender immediately by reply e-mail and delete this message. >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Postgres-xc-developers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >> > > > -- > Michael Paquier > http://michael.otacoo.com > -- -- Abbas Architect EnterpriseDB Corporation The Enterprise PostgreSQL Company Phone: 92-334-5100153 Website: www.enterprisedb.com EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Koichi S. <koi...@gm...> - 2012-06-19 08:20:15
|
Okay, so this is a bug fix which should be included in a minor release. ---------- Koichi Suzuki 2012/6/19 Michael Paquier <mic...@gm...>: > > > On Tue, Jun 19, 2012 at 10:24 AM, Koichi Suzuki <koi...@gm...> > wrote: >> >> Thanks a lot. Yes, it should be pushed down to gtm. >> >> Because this changes specs and current behavior, I'm not sure if it >> should be included in a minor release. > > Yes, it changes a little bit the spec of gtm_ctl, but we should not forget > that the current behavior is incorrect. > Even if a log file is defined with gtm_ctl, the option is not taken into > account by gtm, so committing also that on 1.0 stable is not a big deal. > -- > Michael Paquier > http://michael.otacoo.com |
From: Michael P. <mic...@gm...> - 2012-06-19 05:58:18
|
Hi all, Please find attached an improved patch. I corrected the following points: - Storage table uses an access exclusive lock meaning it cannot be accessed by other sessions in cluster - The table redistributed uses an exclusive lock, it can be accessed by the other sessions in cluster with SELECT while redistribution is running - Addition of an API to manage table locking - Correction of bugs regarding session concurrency. An update in pgxc_class (update of distribution data) was not seen by concurrent sessions in cluster. - doc correction and completion - regression fixes due to grammar change for node list in CTAS, CREATE TABLE, EXECUTE DIRECT and CLEAN CONNECTION - Fix of system functions using EXECUTE direct - Fix for CTAS query generation - update index of catalog pgxc_class updated - Correct update for relation cache when location data is updated Questions are welcome. This patch can be applied on master and works as expected. On Mon, Jun 18, 2012 at 5:25 PM, Michael Paquier <mic...@gm...>wrote: > Hi all, > > Based on the design above, I went to the end of my idea and took a day to > write a prototype for online redistribution based on ALTER TABLE. > It uses the grammar written in previous mail with ADD NODE/DELETE > NODE/DISTRIBUTE BY/TO NODE | GROUP. > > The main idea is the use of what I call a "storage" table which is used as > a temporary location for the data being distributed in cluster. > This table is created as unlogged > > The patch sticks with the design invocated before; > - Cached plans are dropped when redistribution is invocated > - Vacuum is not necessary, this mechanism uses transaction-safe queries > - for the time being, this implementation uses an exclusive lock, but as > the redistribution is done, a ShareUpdateExclusive lock is not to exclude. > - tables are reindexed if necessary. > - redistribution cannot be done inside a transaction block > - redistribution is not authorized with all the other commands as they are > locally-safe on each node. > - no restrictions on the distribution types, table types or subclusters > > This feature can be really improved for example in the case of replicated > tables in particular, when the list of nodes of the table is changed. > It is one of the things I would like to improve as it would really > increase performance > > Regards, > > -- > Michael Paquier > http://michael.otacoo.com > -- Michael Paquier http://michael.otacoo.com |
From: Michael P. <mic...@gm...> - 2012-06-19 05:26:23
|
Hi, I expect pgxc_node_send_cmd_id to have some impact on performance, so be sure to send it to remote Datanodes really only if necessary. You should put more severe conditions blocking this function cid can easily get incremented in Postgres. Regards, On Tue, Jun 19, 2012 at 5:31 AM, Abbas Butt <abb...@en...>wrote: > PFA a WIP patch implementing the design presented earlier. > The patch is WIP because it still has and FIXME and it shows some > regression failures that need to be fixed, but other than that it confirms > that the suggested design would work fine. The following test cases now > work fine > > drop table tt1; > create table tt1(f1 int) distribute by replication; > > > BEGIN; > insert into tt1 values(1); > declare c50 cursor for select * from tt1; > insert into tt1 values(2); > fetch all from c50; > COMMIT; > truncate table tt1; > > BEGIN; > > declare c50 cursor for select * from tt1; > insert into tt1 values(1); > > insert into tt1 values(2); > fetch all from c50; > COMMIT; > truncate table tt1; > > > BEGIN; > insert into tt1 values(1); > insert into tt1 values(2); > > declare c50 cursor for select * from tt1; > insert into tt1 values(3); > > fetch all from c50; > COMMIT; > truncate table tt1; > > > BEGIN; > insert into tt1 values(1); > declare c50 cursor for select * from tt1; > insert into tt1 values(2); > declare c51 cursor for select * from tt1; > insert into tt1 values(3); > fetch all from c50; > fetch all from c51; > COMMIT; > truncate table tt1; > > > BEGIN; > insert into tt1 values(1); > declare c50 cursor for select * from tt1; > declare c51 cursor for select * from tt1; > insert into tt1 values(2); > insert into tt1 values(3); > fetch all from c50; > fetch all from c51; > COMMIT; > truncate table tt1; > > > On Fri, Jun 15, 2012 at 8:07 AM, Abbas Butt <abb...@en...>wrote: > >> Hi, >> >> In a multi-statement transaction each statement is given a command >> identifier >> starting from zero and incrementing for each statement. >> These command indentifers are required for extra tracking because each >> statement has its own visibility rules with in the transaction. >> For example, a cursor’s contents must remain unchanged even if later >> statements in the >> same transaction modify rows. Such tracking is implemented using system >> command id >> columns cmin/cmax, which is internally actually is a single column. >> >> cmin/cmax come into play in case of multi-statement transactions only, >> they are both zero otherwise. >> >> cmin "The command identifier of the statement within the inserting >> transaction." >> cmax "The command identifier of the statement within the deleting >> transaction." >> >> Here are the visibility rules (taken from comments of tqual.c) >> >> ( // A heap tuple is valid "now" >> iff >> Xmin == my-transaction && // inserted by the current >> transaction >> Cmin < my-command && // before this command, and >> ( >> Xmax is null || // the row has not been deleted, >> or >> ( >> Xmax == my-transaction && // it was deleted by the current >> transaction >> Cmax >= my-command // but not before this command, >> ) >> ) >> ) >> || // or >> ( >> Xmin is committed && // the row was inserted by a >> committed transaction, and >> ( >> Xmax is null || // the row has not been deleted, >> or >> ( >> Xmax == my-transaction && // the row is being deleted by >> this transaction >> Cmax >= my-command) || // but it's not deleted "yet", or >> ( >> Xmax != my-transaction && // the row was deleted by >> another transaction >> Xmax is not committed // that has not been committed >> ) >> ) >> ) >> ) >> >> Because cmin and cmax are internally a single system column, >> it is therefore not possible to simply record the status of a row >> that is created and expired in the same multi-statement transaction. >> For that reason, a special combo command id is created that references >> a local memory hash that contains the actual cmin and cmax values. >> It means that if combo id is being used the number we are seeing >> would not be the cmin or cmax it will be an index into a local >> array that contains a structure with has the actual cmin and cmax values. >> >> The following queries (taken mostly from >> http://momjian.us/main/writings/pgsql/mvcc.pdf) >> use the contrib module pageinspect, which allows >> visibility of internal heap page structures and all stored rows, >> including those not visible in the current snapshot. >> (Bit 0x0020 is defined as HEAP_COMBOCID.) >> >> We are exploring 3 examples here: >> 1) INSERT & DELETE in a single transaction >> 2) INSERT & UPDATE in a single transaction >> 3) INSERT from two different transactions & UPDATE from one >> >> test=# drop table mvcc_demo; >> DROP TABLE >> test=# >> test=# create table mvcc_demo (val int); >> CREATE TABLE >> test=# >> test=# TRUNCATE mvcc_demo; >> TRUNCATE TABLE >> test=# >> test=# BEGIN; >> BEGIN >> test=# DELETE FROM mvcc_demo; -- increment command id to show that combo >> id would be different >> DELETE 0 >> test=# DELETE FROM mvcc_demo; >> DELETE 0 >> test=# DELETE FROM mvcc_demo; >> DELETE 0 >> test=# INSERT INTO mvcc_demo VALUES (1); >> INSERT 0 1 >> test=# INSERT INTO mvcc_demo VALUES (2); >> INSERT 0 1 >> test=# INSERT INTO mvcc_demo VALUES (3); >> INSERT 0 1 >> test=# SELECT t_xmin AS xmin, >> test-# t_xmax::text::int8 AS xmax, >> test-# t_field3::text::int8 AS cmin_cmax, >> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> is_combocid >> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> test-# ORDER BY 2 DESC, 3; >> xmin | xmax | cmin_cmax | is_combocid >> -------+------+-----------+------------- >> 80685 | 0 | 3 | f >> 80685 | 0 | 4 | f >> 80685 | 0 | 5 | f >> (3 rows) >> >> test=# >> test=# DELETE FROM mvcc_demo; >> DELETE 3 >> test=# SELECT t_xmin AS xmin, >> test-# t_xmax::text::int8 AS xmax, >> test-# t_field3::text::int8 AS cmin_cmax, >> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> is_combocid >> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> test-# ORDER BY 2 DESC, 3; >> xmin | xmax | cmin_cmax | is_combocid >> -------+-------+-----------+------------- >> 80685 | 80685 | 0 | t >> 80685 | 80685 | 1 | t >> 80685 | 80685 | 2 | t >> (3 rows) >> >> Note that since is_combocid is true the numbers are not cmin/cmax they >> are actually >> the indexes of the internal array already explained above. >> combo id index 0 would contain cmin 3, cmax 6 >> combo id index 1 would contain cmin 4, cmax 6 >> combo id index 2 would contain cmin 5, cmax 6 >> >> test=# >> test=# END; >> COMMIT >> test=# >> test=# >> test=# TRUNCATE mvcc_demo; >> TRUNCATE TABLE >> test=# >> test=# >> test=# >> test=# BEGIN; >> BEGIN >> test=# INSERT INTO mvcc_demo VALUES (1); >> INSERT 0 1 >> test=# INSERT INTO mvcc_demo VALUES (2); >> INSERT 0 1 >> test=# INSERT INTO mvcc_demo VALUES (3); >> INSERT 0 1 >> test=# SELECT t_xmin AS xmin, >> test-# t_xmax::text::int8 AS xmax, >> test-# t_field3::text::int8 AS cmin_cmax, >> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> is_combocid >> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> test-# ORDER BY 2 DESC, 3; >> xmin | xmax | cmin_cmax | is_combocid >> -------+------+-----------+------------- >> 80675 | 0 | 0 | f >> 80675 | 0 | 1 | f >> 80675 | 0 | 2 | f >> (3 rows) >> >> test=# >> test=# UPDATE mvcc_demo SET val = val * 10; >> UPDATE 3 >> test=# >> test=# SELECT t_xmin AS xmin, >> test-# t_xmax::text::int8 AS xmax, >> test-# t_field3::text::int8 AS cmin_cmax, >> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> is_combocid >> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> test-# ORDER BY 2 DESC, 3; >> xmin | xmax | cmin_cmax | is_combocid >> -------+-------+-----------+------------- >> 80675 | 80675 | 0 | t >> 80675 | 80675 | 1 | t >> 80675 | 80675 | 2 | t >> 80675 | 0 | 3 | f >> 80675 | 0 | 3 | f >> 80675 | 0 | 3 | f >> (6 rows) >> >> test=# >> test=# END; >> COMMIT >> test=# >> test=# >> test=# TRUNCATE mvcc_demo; >> TRUNCATE TABLE >> test=# >> >> -- From one psql issue >> test=# INSERT INTO mvcc_demo VALUES (1); >> INSERT 0 1 >> test=# SELECT t_xmin AS xmin, >> test-# t_xmax::text::int8 AS xmax, >> test-# t_field3::text::int8 AS cmin_cmax, >> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> is_combocid >> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> test-# ORDER BY 2 DESC, 3; >> xmin | xmax | cmin_cmax | is_combocid >> -------+------+-----------+------------- >> 80677 | 0 | 0 | f >> (1 row) >> >> >> test=# -- From another issue >> test=# BEGIN; >> BEGIN >> test=# INSERT INTO mvcc_demo VALUES (2); >> INSERT 0 1 >> test=# INSERT INTO mvcc_demo VALUES (3); >> INSERT 0 1 >> test=# INSERT INTO mvcc_demo VALUES (4); >> INSERT 0 1 >> test=# SELECT t_xmin AS xmin, >> test-# t_xmax::text::int8 AS xmax, >> test-# t_field3::text::int8 AS cmin_cmax, >> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> is_combocid >> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> test-# ORDER BY 2 DESC, 3; >> xmin | xmax | cmin_cmax | is_combocid >> -------+------+-----------+------------- >> 80677 | 0 | 0 | f >> 80678 | 0 | 0 | f >> 80678 | 0 | 1 | f >> 80678 | 0 | 2 | f >> (4 rows) >> >> test=# >> test=# UPDATE mvcc_demo SET val = val * 10; >> UPDATE 4 >> test=# SELECT t_xmin AS xmin, >> test-# t_xmax::text::int8 AS xmax, >> test-# t_field3::text::int8 AS cmin_cmax, >> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> is_combocid >> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> test-# ORDER BY 2 DESC, 3; >> xmin | xmax | cmin_cmax | is_combocid >> -------+-------+-----------+------------- >> 80678 | 80678 | 0 | t >> 80678 | 80678 | 1 | t >> 80678 | 80678 | 2 | t >> 80677 | 80678 | 3 | f >> 80678 | 0 | 3 | f >> 80678 | 0 | 3 | f >> 80678 | 0 | 3 | f >> 80678 | 0 | 3 | f >> (8 rows) >> >> test=# >> >> test=# -- Before finishing this, issue these from the first psql >> test=# SELECT t_xmin AS xmin, >> test-# t_xmax::text::int8 AS xmax, >> test-# t_field3::text::int8 AS cmin_cmax, >> test-# (t_infomask::integer & X'0020'::integer)::bool AS >> is_combocid >> test-# FROM heap_page_items(get_raw_page('mvcc_demo', 0)) >> test-# ORDER BY 2 DESC, 3; >> xmin | xmax | cmin_cmax | is_combocid >> -------+-------+-----------+------------- >> 80678 | 80678 | 0 | t >> 80678 | 80678 | 1 | t >> 80678 | 80678 | 2 | t >> 80677 | 80678 | 3 | f >> 80678 | 0 | 3 | f >> 80678 | 0 | 3 | f >> 80678 | 0 | 3 | f >> 80678 | 0 | 3 | f >> (8 rows) >> >> test=# END; >> COMMIT >> >> >> Now consider the case we are trying to solve >> >> drop table tt1; >> create table tt1(f1 int); >> >> BEGIN; >> insert into tt1 values(1); >> declare c50 cursor for select * from tt1; -- should show one row only >> insert into tt1 values(2); >> fetch all from c50; >> COMMIT; >> >> >> Consider Data node 1 log >> >> (a) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read >> committed READ WRITE] >> (b) [exec_simple_query][1026][drop table tt1;] >> (c) [exec_simple_query][1026][PREPARE TRANSACTION 'T21075'] >> (d) [exec_simple_query][1026][COMMIT PREPARED 'T21075'] >> (e) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read >> committed READ WRITE] >> (f) [exec_simple_query][1026][create table tt1(f1 int);] >> (g) [exec_simple_query][1026][PREPARE TRANSACTION 'T21077'] >> (h) [exec_simple_query][1026][COMMIT PREPARED 'T21077'] >> (i) [exec_simple_query][1026][START TRANSACTION ISOLATION LEVEL read >> committed READ WRITE] >> (j) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (1)] >> (k) [exec_simple_query][1026][INSERT INTO tt1 (f1) VALUES (2)] >> (l) [PostgresMain][4155][SELECT tt1.f1, tt1.ctid, pgxc_node_str() FROM >> tt1] >> (m) [exec_simple_query][1026][COMMIT TRANSACTION] >> >> The cursor currently shows both inserted rows because command id at data >> node in >> step (j) is 0 >> step (k) is 1 & >> step (l) is 2 >> >> Where as we need command ids to be >> >> step (j) should be 0 >> step (k) should be 2 & >> step (l) should be 1 >> >> This will solve the cursor visibility problem. >> >> To implement this I suggest we send command IDs to data nodes from the >> coordinator >> like we send gxid. The only difference will be that we do not need to >> take command IDs >> from GTM since they are only valid with in the transaction. >> >> See this example >> >> test=# select xmin,xmax,cmin,cmax,* from tt1; >> xmin | xmax | cmin | cmax | f1 >> ------+------+------+------+---- >> (0 rows) >> >> test=# begin; >> BEGIN >> test=# insert into tt1 values(1); >> INSERT 0 1 >> test=# select xmin,xmax,cmin,cmax,* from tt1; >> xmin | xmax | cmin | cmax | f1 >> -------+------+------+------+---- >> 80615 | 0 | 0 | 0 | 1 >> (1 row) >> >> test=# insert into tt1 values(2); >> INSERT 0 1 >> test=# select xmin,xmax,cmin,cmax,* from tt1; >> xmin | xmax | cmin | cmax | f1 >> -------+------+------+------+---- >> 80615 | 0 | 0 | 0 | 1 >> 80615 | 0 | 1 | 1 | 2 >> (2 rows) >> >> test=# insert into tt1 values(3); >> INSERT 0 1 >> test=# select xmin,xmax,cmin,cmax,* from tt1; >> xmin | xmax | cmin | cmax | f1 >> -------+------+------+------+---- >> 80615 | 0 | 0 | 0 | 1 >> 80615 | 0 | 1 | 1 | 2 >> 80615 | 0 | 2 | 2 | 3 >> (3 rows) >> >> test=# insert into tt1 values(4); >> INSERT 0 1 >> test=# select xmin,xmax,cmin,cmax,* from tt1; >> xmin | xmax | cmin | cmax | f1 >> -------+------+------+------+---- >> 80615 | 0 | 0 | 0 | 1 >> 80615 | 0 | 1 | 1 | 2 >> 80615 | 0 | 2 | 2 | 3 >> 80615 | 0 | 3 | 3 | 4 >> (4 rows) >> >> test=# end; >> COMMIT >> test=# >> test=# >> test=# select xmin,xmax,cmin,cmax,* from tt1; >> xmin | xmax | cmin | cmax | f1 >> -------+------+------+------+---- >> 80615 | 0 | 0 | 0 | 1 >> 80615 | 0 | 1 | 1 | 2 >> 80615 | 0 | 2 | 2 | 3 >> 80615 | 0 | 3 | 3 | 4 >> (4 rows) >> >> test=# insert into tt1 values(5); >> INSERT 0 1 >> test=# select xmin,xmax,cmin,cmax,* from tt1; >> xmin | xmax | cmin | cmax | f1 >> -------+------+------+------+---- >> 80615 | 0 | 0 | 0 | 1 >> 80615 | 0 | 1 | 1 | 2 >> 80615 | 0 | 2 | 2 | 3 >> 80615 | 0 | 3 | 3 | 4 >> 80616 | 0 | 0 | 0 | 5 >> (5 rows) >> >> test=# insert into tt1 values(6); >> INSERT 0 1 >> test=# >> test=# >> test=# select xmin,xmax,cmin,cmax,* from tt1; >> xmin | xmax | cmin | cmax | f1 >> -------+------+------+------+---- >> 80615 | 0 | 0 | 0 | 1 >> 80615 | 0 | 1 | 1 | 2 >> 80615 | 0 | 2 | 2 | 3 >> 80615 | 0 | 3 | 3 | 4 >> 80616 | 0 | 0 | 0 | 5 >> 80617 | 0 | 0 | 0 | 6 >> (6 rows) >> >> Note that at the end of the multi-statement transaction the command id >> gets reset to zero. >> >> -- >> Abbas >> Architect >> EnterpriseDB Corporation >> The Enterprise PostgreSQL Company > > > > > -- > -- > Abbas > Architect > EnterpriseDB Corporation > The Enterprise PostgreSQL Company > > Phone: 92-334-5100153 > > Website: www.enterprisedb.com > EnterpriseDB Blog: http://blogs.enterprisedb.com/ > Follow us on Twitter: http://www.twitter.com/enterprisedb > > This e-mail message (and any attachment) is intended for the use of > the individual or entity to whom it is addressed. This message > contains information from EnterpriseDB Corporation that may be > privileged, confidential, or exempt from disclosure under applicable > law. If you are not the intended recipient or authorized to receive > this for the intended recipient, any use, dissemination, distribution, > retention, archiving, or copying of this communication is strictly > prohibited. If you have received this e-mail in error, please notify > the sender immediately by reply e-mail and delete this message. > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > -- Michael Paquier http://michael.otacoo.com |
From: Michael P. <mic...@gm...> - 2012-06-19 04:37:05
|
I got that this is not a patch for review but... On Mon, Jun 18, 2012 at 8:32 PM, Ashutosh Bapat < ash...@en...> wrote: > Hi All, > I am working on 3534757. > > Currently, the JOIN reduction happens after the paths are finalized. The > path creation does not take into consideration whether a particular JOIN is > shippable or not. Hence, it can create paths with JOIN orders which are not > necessarily pushable to the data-nodes. > > While creating paths, we create the relations for every possible JOIN and > also build the set of paths (ways to plan) for that JOIN and store it in > corresponding path node. In XC, we will also create the RemoteQuery paths > for each JOIN relation if the JOIN is shippable to the datanode/s. In this > remote query path, we store the list of datanodes where this JOIN is > shippable and also the JOIN tree corresponding to this JOIN. (Remember > that, a JOIN can be between two plain relations or two join relations or a > join relation and a plain one. This JOIN tree will be used to construct the > FROM and WHERE clauses of the SQL query to be constructed while creating > RemoteQuery plan. > > PFA the WIP patch for this algorithm. This patch may not even compile but > it should be sufficient to give idea about the approach. > It would be nice to have the APIs related to path determination of remote joins in a separated file as mentionned in your patch. Something like optimizer/path/remotepath.c? Just by looking at your code, you are removing calls to pgxc_merge_exec_nodes where the node list can be reduced depending on the subcluster where data is located for each relation. Please be sure to test also the new join paths for that. Regards, -- Michael Paquier http://michael.otacoo.com |