postgres-xc-developers Mailing List for Postgres-XC

Brought to you by: ahsanhadi, amitdkhan, ashutoshbapat, gabbasb, and 3 others

postgres-xc-developers — Postgres-XC hackers and developers

You can subscribe to this list here.

2010	Jan	Feb	Mar	Apr (10)	May (17)	Jun (3)	Jul	Aug	Sep (8)	Oct (18)	Nov (51)	Dec (74)
2011	Jan (47)	Feb (44)	Mar (44)	Apr (102)	May (35)	Jun (25)	Jul (56)	Aug (69)	Sep (32)	Oct (37)	Nov (31)	Dec (16)
2012	Jan (34)	Feb (127)	Mar (218)	Apr (252)	May (80)	Jun (137)	Jul (205)	Aug (159)	Sep (35)	Oct (50)	Nov (82)	Dec (52)
2013	Jan (107)	Feb (159)	Mar (118)	Apr (163)	May (151)	Jun (89)	Jul (106)	Aug (177)	Sep (49)	Oct (63)	Nov (46)	Dec (7)
2014	Jan (65)	Feb (128)	Mar (40)	Apr (11)	May (4)	Jun (8)	Jul (16)	Aug (11)	Sep (4)	Oct (1)	Nov (5)	Dec (16)
2015	Jan (5)	Feb	Mar (2)	Apr (5)	May (4)	Jun (12)	Jul	Aug	Sep	Oct	Nov	Dec (4)
2019	Jan	Feb	Mar	Apr	May	Jun	Jul (2)	Aug	Sep	Oct	Nov	Dec

S	M	T	W	T	F	S
	1 (14)	2	3 (4)	4 (12)	5 (14)	6
7 (1)	8 (7)	9 (10)	10 (7)	11 (8)	12 (6)	13
14 (1)	15 (3)	16 (1)	17 (8)	18 (11)	19 (3)	20
21 (2)	22 (9)	23 (2)	24 (14)	25 (13)	26 (1)	27
28	29 (1)	30 (11)

Flat | Threaded

[Postgres-xc-developers] Proposed steps to remove a node

From: Abbas B. <abb...@en...> - 2013-04-19 20:22:56

Hi,

Here are the proposed steps to remove a node from the cluster.

Removing an existing coordinator
==========================

Assume a two coordinator cluster, COORD_1 & COORD_2
Suppose we want to remove COORD2 for any reason.

1. Stop the coordinator to be removed.
   In our example we need to stop COORD_2.

2. Connect to any of the coordinators except the one to be removed.
   In our example assuming COORD_1 is running on port 5432,
   the following command would connect to COORD_1

   psql postgres -p 5432

3. Drop the coordinator to be removed.
   For example to drop coordinator COORD_2

   DROP NODE COORD_2;

4. Update the connection information cached in pool.

   SELECT pgxc_pool_reload();

COORD_2 is now removed from the cluster & COORD_1 would work as if COORD_2
never existed.

CAUTION : If COORD_2 is still running and clients are connected to it,  any
queries issued would create inconsistencies in the cluster.

Please note that there is no need to block DDLs because either way DDLs
will fail after step 1 and before step 4.




Removing an existing datanode
=========================

Assume a two coordinator cluster, COORD_1 & COORD_2
with three datanodes DATA_NODE_1, DATA_NODE_2 & DATA_NODE_3

Suppose we want to remove DATA_NODE_3 for any reason.

Further assume there is a table named rr_abc distributed in round robin
fashion
and has rows on all the three datanodes.

1. Block DMLs so that during step 2, while we are shifting data from
   the datanode to be removed some one could have an insert process
   inserting data in the same.

   Here we will need to add a system function similar to
pgxc_lock_for_backup.
   This is a to do item.

2. Transfer the data from the datanode to be removed to the rest of the
datanodes for all the tables in all the databases.
   For example to shift data of the table rr_abc to the
   rest of the nodes we can use command

   ALTER TABLE rr_abc DELETE NODE (DATA_NODE_3);

3. Confirm that there is no data left on the datanode to be removed.
   For example to confirm that there is no data left on DATA_NODE_3

   select c.pcrelid from pgxc_class c, pgxc_node n where
   n.node_name = 'DATA_NODE_3' and n.oid = ANY (c.nodeoids);

4. Stop the datanode server to be removed.
   Now any SELECTs that involve the datanode to be removed would start
failing
   and DMLs have already been blocked, so essentially the cluster would work
   only partially.

5. Connect to any of the coordinators.
   In our example assuming COORD_1 is running on port 5432,
   the following command would connect to COORD_1

   psql postgres -p 5432

6. Drop the datanode to be removed.
   For example to drop datanode DATA_NODE_3 use command

   DROP NODE DATA_NODE_3;

7. Update the connection information cached in pool.

   SELECT pgxc_pool_reload();

8. Repeat steps 5,6 & 7 for all the coordinators in the cluster.

9. UN-Block DMLs

DATA_NODE_3 is now removed from the cluster.


Comments are welcome.

-- 
*Abbas*
 Architect

Ph: 92.334.5100153
 Skype ID: gabbasb
www.enterprisedb.co
<http://www.enterprisedb.com/>m<http://www.enterprisedb.com/>
*
Follow us on Twitter*
@EnterpriseDB

Visit EnterpriseDB for tutorials, webinars,
whitepapers<http://www.enterprisedb.com/resources-community>and
more<http://www.enterprisedb.com/resources-community>

Re: [Postgres-xc-developers] Saving AR trigger rows in tuplestore

From: Amit K. <ami...@en...> - 2013-04-19 11:33:25

On 17 April 2013 16:46, Ashutosh Bapat <ash...@en...>wrote:

> Hi Amit,
> Thanks for completing this tedius work. It's pretty complicated.
>
> As I understand it right, the patch deals with following things
>
> For after row triggers, PG stores the fireable triggers as events with
> ctid of the row as pointer to the row on which the event should be carried
> out. For INSERT and DELETE there is only one ctid viz. new or old resp. For
> UPDATE, ctids of both the new and old row are stored. For some reason (I am
> not clear about the reasons) after row triggers are fired after queueing
> the events and thus we need some storage on coordinator to store the
> affected tuples, so that they can be retrieved while firing the triggers.
> We do not save the entire tuple in the trigger event, to save memory if
> there are multiple events that need the same tuple. In PG, ctid of the row
> suffices to fetch the row from the heap, which acts as the storage itself.
> In XC, however, we need some storage to store the tuples to be fed to
> trigger events and need a pointer for each row stored. This pointer will be
> saved in the trigger event, and will be used to fetch the row. Your patch
> uses two tuplestores to store old and new rows resp. For Update we will use
> both the tuplestores, but for INSERT we will use only one of them.
>
> Here are my comments
> 1. As I understand, the tuplestore has a different kind of pointer than
> ctid and thus you have created a union in Trigger event structure. Can we
> use hash based storage instead of tuplestore? The hash based storage will
> have advantages like a. existing ctid, nodeoid combination can be used as
> the key in hash store, thus not requiring any union (but will need to add
> an OID member). The same ItemPointer structure can be then used, instead of
> creating prototypes for XC. b. Hash is ideally a random access storage
> unlike tuplestore, which needs some kind of sequential access. c. At places
> there is code to first get a pointer in Tuplestore before actually adding
> the row, which complicates the code. Hash storage will not have this
> problem since the key is independent of the position in hash storage.
>
> 2. Using two separate tuplestores for new and old tuples is waste of
> memory. A tuplestore allocates 1K memory by default, thus having two tuple
> stores requires double the amount of memory. If in worst case, the
> tuplestore overflows to disk, we will have two files created on file
> system, causing random sequential writes on disk, which will affect the
> performance. This will mean that the same row pointer can not be used for
> OLD and NEW, but that should be fine, as PG itself doesn't need that
> condition.
>
> 3. The tuple store management code is too much tied with the static
> structures in trigger.c. We need to isolate this code in a separate file,
> so that this approach can be used for other features like constraints if
> required. Please separate this code into a separate file with well defined
> interfaces like function to add a row to storage, get its pointer, fetch
> the row from storage, delete the row from storage (?), destroy the storage
> etc. and use them for trigger functionality. In the same file, we need a
> prologue describing the need of these interfaces and description of the
> interfaces itself. In fact, if this infrastructure is also needed in PG, we
> should put it in PG.
>
> 4. While using two tuplestores we have hardcoded the tuplestore indices as
> 0 and 1. Instead of that can we use some macros OR even better use
> different variables for both of them. Same goes for all 2 sized arrays that
> are defined for the same purpose.
>
> 5. Please look at the current trigger regression tests. If they do not
> cover all the possible test scenarios please add them in the regression.
> Testing all the scenarios (various combinations of type triggers, DMLs) is
> critical here.
>
> If you find that the current implementation is working fine, all the above
> points can be taken up later after the 1.1 release. The testing can be take
> up between beta and GA, and others can be taken up in next release. But
> it's important to at least study these approaches.
>

Thanks Ashutosh for the valuable comments and for your patience in
reviewing this work. I will come back to the above points once I commit the
trigger support.


> Some specific comments
> 1. In function pgxc_ar_init_rowstore(), we have used palloc0 + memcpy +
> pfree() instead of repalloc + memzero new entries. Repalloc allows to
> extend the existing memory allocation without moving the contents (if
> possible) and has advantage that it wouldn't fail if sum of allocated
> memory and required memory is greater than available memory but required
> memory is less than the available memory. So, it's always advantageous to
> use Repalloc. Why haven't we used repalloc here?
>
>
I had thought palloc0 + pfree would be simpler, but repalloc code turned
out to be as simple. I have changed the code to use repalloc. Thanks.


> 2. Can we extend pgxc_ar_goto_end(), to be goto_anywhere function, where
> end is a special position. E.g pgxc_ar_goto(ARTupInfo, Pos), where Pos can
> be a valid index OR a special position END. pgxc_ar_goto(ARTupInfo, END)
> would act as pgxc_ar_goto_end() and pgxc_ar_goto(ARTupInfo. Pos != END)
> would replace the tuplestore advance loop in pgxc_ar_dofetch(). The
> function may accept a flag backwards if that is required.
>

So suppose in pgxc_ar_dofetch() I extract first part of scanning the
tuplestore upto the fetchpos into a new function pgxc_ar_goto(). And then
continue the actual fetch part in the same function pgxc_ar_dofetch(). Then
in pgxc_ar_goto() we would have to special-case this goto_end()
functionality. The usage of advance_by counter to get to the required
position cannot be used. Because for going to the end, we need
tuplestore_eof() condition and that condition cannot be mixed up in that
code. Also, the advance-by related code has to be skipped. So, since
anyways we have two different codes for two scenarios, I think it is better
to keep a different function pgxc_ar_gotoend() for the scan-upto-end
scenario.


>
> On Mon, Apr 15, 2013 at 9:54 AM, Amit Khandekar <
> ami...@en...> wrote:
>
>>
>>> On Fri, Apr 5, 2013 at 2:38 PM, Amit Khandekar <
>>> ami...@en...> wrote:
>>>
>>>> FYI .. I will use the following document to keep updating the
>>>> implementation details for "Saving AR trigger rows in tuplestore" :
>>>>
>>>>
>>>> https://docs.google.com/document/d/158IPS9npmfNsOWPN6ZYgPy91aowTUNP7L7Fl9zBBGqs/edit?usp=sharing
>>>>
>>>
>> Attached is the patch to support after-row triggers. The above doc is
>> updated. Yet to analyse the regression tests. The attached test.sql is the
>> one I used for unit testing, it is not yet ready to be inserted into
>> regression suite. I will be working next on the regression and Ashutosh's
>> comments on before-row triggers
>>
>> Also I haven't yet rebased the rowtriggers branch over the new
>> merge-related changes in the master branch. This patch is over the
>> rowtriggers branch; I did not push this patch onto the rowtriggers branch
>> as well, although I intended to do it, but suspected of some possible
>> issues if I push the rowtriggers branch after the recent  merge-related
>> changes going on in the repository. First I will rebase all the rowtriggers
>> branch changes onto the new master branch.
>>
>>
>>
>>
>>
>>
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Minimize network downtime and maximize team effectiveness.
>>>> Reduce network management and security costs.Learn how to hire
>>>> the most talented Cisco Certified professionals. Visit the
>>>> Employer Resources Portal
>>>> http://www.cisco.com/web/learning/employer_resources/index.html
>>>> _______________________________________________
>>>> Postgres-xc-developers mailing list
>>>> Pos...@li...
>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>>>>
>>>>
>>>
>>>
>>> --
>>> Pavan Deolasee
>>> http://www.linkedin.com/in/pavandeolasee
>>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Precog is a next-generation analytics platform capable of advanced
>> analytics on semi-structured data. The platform includes APIs for building
>> apps and a phenomenal toolset for data science. Developers can use
>> our toolset for easy data analysis & visualization. Get a free account!
>> http://www2.precog.com/precogplatform/slashdotnewsletter
>> _______________________________________________
>> Postgres-xc-developers mailing list
>> Pos...@li...
>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>>
>>
>
>
> --
> Best Wishes,
> Ashutosh Bapat
> EntepriseDB Corporation
> The Enterprise Postgres Company
>

Re: [Postgres-xc-developers] Using remote sorting for merge-join

From: Ashutosh B. <ash...@en...> - 2013-04-19 03:42:58

Then the perf is expected. There is too much IO on the same machine and too
much context switch.


On Thu, Apr 18, 2013 at 7:35 PM, Abbas Butt <abb...@en...>wrote:

> All instances on the same machine.
>
>
> On Thu, Apr 18, 2013 at 4:38 PM, Ashutosh Bapat <
> ash...@en...> wrote:
>
>> Did you do it on true cluster or by running all instances on same
>> machine? The later would degrade the performance.
>>
>>
>> On Thu, Apr 18, 2013 at 4:38 PM, Abbas Butt <abb...@en...>wrote:
>>
>>>
>>>
>>> On Thu, Apr 18, 2013 at 8:43 AM, Ashutosh Bapat <
>>> ash...@en...> wrote:
>>>
>>>> Did you measure the performance?
>>>>
>>>
>>> I tried but I was getting very strange numbers , It took some hours but
>>> reported
>>>
>>> Time: 365649.353 ms
>>>
>>> which comes out to be some 6 minutes, I am not sure why.
>>>
>>>
>>>>
>>>>
>>>> On Thu, Apr 18, 2013 at 9:02 AM, Abbas Butt <
>>>> abb...@en...> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Thu, Apr 18, 2013 at 1:07 AM, Abbas Butt <
>>>>> abb...@en...> wrote:
>>>>>
>>>>>> Hi,
>>>>>> Here is the review of the patch.
>>>>>>
>>>>>> Overall the patch is good to go. I have reviewed the code and found
>>>>>> some minor errors, which I corrected and have attached the revised patch
>>>>>> with the mail.
>>>>>>
>>>>>> I have tested both the cases when the sort happens in memory and when
>>>>>> it happens using disk and found both working.
>>>>>>
>>>>>> I agree that the approach used in the patch is cleaner and has
>>>>>> smaller footprint.
>>>>>>
>>>>>> I have corrected some white space errors and an unintentional change
>>>>>> in function set_dbcleanup_callback
>>>>>>     git apply /home/edb/Desktop/MergeSort/xc_sort.patch
>>>>>>     /home/edb/Desktop/MergeSort/xc_sort.patch:539: trailing
>>>>>> whitespace.
>>>>>>         void *fparams;
>>>>>>     /home/edb/Desktop/MergeSort/xc_sort.patch:1012: trailing
>>>>>> whitespace.
>>>>>>
>>>>>>     /home/edb/Desktop/MergeSort/xc_sort.patch:1018: trailing
>>>>>> whitespace.
>>>>>>
>>>>>>     /home/edb/Desktop/MergeSort/xc_sort.patch:1087: trailing
>>>>>> whitespace.
>>>>>>         /*
>>>>>>     /home/edb/Desktop/MergeSort/xc_sort.patch:1228: trailing
>>>>>> whitespace.
>>>>>>                           size_t len, Oid msgnode_oid,
>>>>>>     warning: 5 lines add whitespace errors.
>>>>>>
>>>>>> I am leaving a query running for tonight which would sort 10M rows of
>>>>>> a distributed table and would return top 100 of them. I would report its
>>>>>> outcome tomorrow morning.
>>>>>>
>>>>>
>>>>> It worked, here is the test case
>>>>>
>>>>> 1. create table test1 (id integer primary key , padding text);
>>>>> 2. Load 10M rows
>>>>> 3. select id from test1 order by 1 limit 100
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Best Regards
>>>>>>
>>>>>>
>>>>>> On Mon, Apr 1, 2013 at 11:02 AM, Koichi Suzuki <
>>>>>> koi...@gm...> wrote:
>>>>>>
>>>>>>> Thanks.  Then 90% improvement means about 53% of the duration, while
>>>>>>> 50% means 67% of it.   Number of queries in a given duration is 190 vs.
>>>>>>> 150, difference is 40.
>>>>>>>
>>>>>>> Considering the needed resource, it may be okay to begin with
>>>>>>> materialization.
>>>>>>>
>>>>>>> Any other inputs?
>>>>>>> ----------
>>>>>>> Koichi Suzuki
>>>>>>>
>>>>>>>
>>>>>>> 2013/4/1 Ashutosh Bapat <ash...@en...>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Apr 1, 2013 at 10:59 AM, Koichi Suzuki <
>>>>>>>> koi...@gm...> wrote:
>>>>>>>>
>>>>>>>>> I understand materialize everything makes code clearer and
>>>>>>>>> implementation becomes simpler and better structured.
>>>>>>>>>
>>>>>>>>> What do you mean by x% improvement?   Does 90% improvement mean
>>>>>>>>> the total duration is 10% of the original?
>>>>>>>>>
>>>>>>>> x% improvement means, duration reduces to 100/(100+x) as compared
>>>>>>>> to the non-pushdown scenario. Or in simpler words, we see (100+x) queries
>>>>>>>> being completed by pushdown approach in the same time in which nonpushdown
>>>>>>>> approach completes 100 queries.
>>>>>>>>
>>>>>>>>> ----------
>>>>>>>>> Koichi Suzuki
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2013/3/29 Ashutosh Bapat <ash...@en...>
>>>>>>>>>
>>>>>>>>>> Hi All,
>>>>>>>>>> I measured the scale up for both approaches - a. using datanode
>>>>>>>>>> connections as tapes (existing one) b. materialising result on tapes before
>>>>>>>>>> merging (the approach I proposed). For 1M rows, 5 coordinators I have found
>>>>>>>>>> that approach (a) gives 90% improvement whereas approach (b) gives 50%
>>>>>>>>>> improvement. Although the difference is significant, I feel that approach
>>>>>>>>>> (b) is much cleaner than approach (a) and doesn't have large footprint
>>>>>>>>>> compared to PG code and it takes care of all the cases like 1.
>>>>>>>>>> materialising sorted result, 2. takes care of any number of datanode
>>>>>>>>>> connections without memory overrun. It's possible to improve it further if
>>>>>>>>>> we avoid materialisation of datanode result in tuplestore.
>>>>>>>>>>
>>>>>>>>>> Patch attached for reference.
>>>>>>>>>>
>>>>>>>>>> On Tue, Mar 26, 2013 at 10:38 AM, Ashutosh Bapat <
>>>>>>>>>> ash...@en...> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Mar 26, 2013 at 10:19 AM, Koichi Suzuki <
>>>>>>>>>>> koi...@gm...> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> On thing we should think for option 1 is:
>>>>>>>>>>>>
>>>>>>>>>>>> When a number of the result is huge, applications has to wait
>>>>>>>>>>>> long
>>>>>>>>>>>> time until they get the first row.  Because this option may
>>>>>>>>>>>> need disk
>>>>>>>>>>>> write, total resource consumption will be larger.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> Yes, I am aware of this fact. Please read the next paragraph and
>>>>>>>>>>> you will see that the current situation is no better.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> I'm wondering if we can use "cursor" at database so that we can
>>>>>>>>>>>> read
>>>>>>>>>>>> each tape more simply, I mean, to leave each query node open
>>>>>>>>>>>> and read
>>>>>>>>>>>> next row from any query node.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> We do that right now. But because of such a simulated cursor
>>>>>>>>>>> (it's not cursor per say, but we just fetch the required result from
>>>>>>>>>>> connection as the demand arises in merging runs), we observer following
>>>>>>>>>>> things
>>>>>>>>>>>
>>>>>>>>>>> If the plan has multiple remote query nodes (as there will be in
>>>>>>>>>>> case of merge join), we assign the same connection to these nodes. Before
>>>>>>>>>>> this assignment, the result from the previous connection is materialised at
>>>>>>>>>>> the coordinator. This means that, when we will get huge result from the
>>>>>>>>>>> datanode, it will be materialised (which will have the more cost as
>>>>>>>>>>> materialising it on tape, as this materialisation happens in a linked list,
>>>>>>>>>>> which is not optimized). We need to share connection between more than one
>>>>>>>>>>> RemoteQuery node because same transaction can not work on two connections
>>>>>>>>>>> to same server. Not only performance, but the code has become ugly because
>>>>>>>>>>> of this approach. At various places in executor, we have special handling
>>>>>>>>>>> for sorting, which needs to be maintained.
>>>>>>>>>>>
>>>>>>>>>>> Instead if we materialise all the result on tape and then
>>>>>>>>>>> proceed with step D5 in Knuth's algorithm for polyphase merge sort, the
>>>>>>>>>>> code will be much simpler and we won't loose much performance. In fact, we
>>>>>>>>>>> might be able to leverage fetching bulk data on connection which can be
>>>>>>>>>>> materialised on tape in bulk.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Regards;
>>>>>>>>>>>> ----------
>>>>>>>>>>>> Koichi Suzuki
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2013/3/25 Ashutosh Bapat <ash...@en...>:
>>>>>>>>>>>> > Hi All,
>>>>>>>>>>>> > I am working on using remote sorting for merge joins. The
>>>>>>>>>>>> idea is while
>>>>>>>>>>>> > using merge join at the coordinator, get the data sorted from
>>>>>>>>>>>> the datanodes;
>>>>>>>>>>>> > for replicated relations, we can get all the rows sorted and
>>>>>>>>>>>> for distributed
>>>>>>>>>>>> > tables we have to get sorted runs which can be merged at the
>>>>>>>>>>>> coordinator.
>>>>>>>>>>>> > For merge join the sorted inner relation needs to be randomly
>>>>>>>>>>>> accessible.
>>>>>>>>>>>> > For replicated relations this can be achieved by
>>>>>>>>>>>> materialising the result.
>>>>>>>>>>>> > But for distributed relations, we do not materialise the
>>>>>>>>>>>> sorted result at
>>>>>>>>>>>> > coordinator but compute the sorted result by merging the
>>>>>>>>>>>> sorted results from
>>>>>>>>>>>> > individual nodes on the fly. For distributed relations, the
>>>>>>>>>>>> connection to
>>>>>>>>>>>> > the datanodes themselves are used as logical tapes (which
>>>>>>>>>>>> provide the sorted
>>>>>>>>>>>> > runs). The final result is computed on the fly by choosing
>>>>>>>>>>>> the smallest or
>>>>>>>>>>>> > greatest row (as required) from the connections.
>>>>>>>>>>>> >
>>>>>>>>>>>> > For a Sort node the materialised result can reside in memory
>>>>>>>>>>>> (if it fits
>>>>>>>>>>>> > there) or on one of the logical tapes used for merge sort.
>>>>>>>>>>>> So, in order to
>>>>>>>>>>>> > provide random access to the sorted result, we need to
>>>>>>>>>>>> materialise the
>>>>>>>>>>>> > result either in the memory or on the logical tape. In-memory
>>>>>>>>>>>> > materialisation is not easily possible since we have already
>>>>>>>>>>>> resorted for
>>>>>>>>>>>> > tape based sort, in case of distributed relations and to
>>>>>>>>>>>> materialise the
>>>>>>>>>>>> > result on tape, there is no logical tape available in current
>>>>>>>>>>>> algorithm. To
>>>>>>>>>>>> > make it work, there are following possible ways
>>>>>>>>>>>> >
>>>>>>>>>>>> > 1. When random access is required, materialise the sorted
>>>>>>>>>>>> runs from
>>>>>>>>>>>> > individual nodes onto tapes (one tape for each node) and then
>>>>>>>>>>>> merge them on
>>>>>>>>>>>> > one extra tape, which can be used for materialisation.
>>>>>>>>>>>> > 2. Use a mix of connections and logical tape in the same tape
>>>>>>>>>>>> set. Merge the
>>>>>>>>>>>> > sorted runs from connections on a logical tape in the same
>>>>>>>>>>>> logical tape set.
>>>>>>>>>>>> >
>>>>>>>>>>>> > While the second one looks attractive from performance
>>>>>>>>>>>> perspective (it saves
>>>>>>>>>>>> > writing and reading from the tape), it would make the merge
>>>>>>>>>>>> code ugly by
>>>>>>>>>>>> > using mixed tapes. The read calls for connection and logical
>>>>>>>>>>>> tape are
>>>>>>>>>>>> > different and we will need both on the logical tape where the
>>>>>>>>>>>> final result
>>>>>>>>>>>> > is materialized. So, I am thinking of going with 1, in fact,
>>>>>>>>>>>> to have same
>>>>>>>>>>>> > code to handle remote sort, use 1 in all cases (whether or not
>>>>>>>>>>>> > materialization is required).
>>>>>>>>>>>> >
>>>>>>>>>>>> > Had original authors of remote sort code thought about this
>>>>>>>>>>>> materialization?
>>>>>>>>>>>> > Anything they can share on this topic?
>>>>>>>>>>>> > Any comment?
>>>>>>>>>>>> > --
>>>>>>>>>>>> > Best Wishes,
>>>>>>>>>>>> > Ashutosh Bapat
>>>>>>>>>>>> > EntepriseDB Corporation
>>>>>>>>>>>> > The Enterprise Postgres Company
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> ------------------------------------------------------------------------------
>>>>>>>>>>>> > Everyone hates slow websites. So do we.
>>>>>>>>>>>> > Make your web apps faster with AppDynamics
>>>>>>>>>>>> > Download AppDynamics Lite for free today:
>>>>>>>>>>>> > http://p.sf.net/sfu/appdyn_d2d_mar
>>>>>>>>>>>> > _______________________________________________
>>>>>>>>>>>> > Postgres-xc-developers mailing list
>>>>>>>>>>>> > Pos...@li...
>>>>>>>>>>>> >
>>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>>>>>>>>>>>> >
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Best Wishes,
>>>>>>>>>>> Ashutosh Bapat
>>>>>>>>>>> EntepriseDB Corporation
>>>>>>>>>>> The Enterprise Postgres Company
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Best Wishes,
>>>>>>>>>> Ashutosh Bapat
>>>>>>>>>> EntepriseDB Corporation
>>>>>>>>>> The Enterprise Postgres Company
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Wishes,
>>>>>>>> Ashutosh Bapat
>>>>>>>> EntepriseDB Corporation
>>>>>>>> The Enterprise Postgres Company
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------------------------------
>>>>>>> Own the Future-Intel&reg; Level Up Game Demo Contest 2013
>>>>>>> Rise to greatness in Intel's independent game demo contest.
>>>>>>> Compete for recognition, cash, and the chance to get your game
>>>>>>> on Steam. $5K grand prize plus 10 genre and skill prizes.
>>>>>>> Submit your demo by 6/6/13. http://p.sf.net/sfu/intel_levelupd2d
>>>>>>> _______________________________________________
>>>>>>> Postgres-xc-developers mailing list
>>>>>>> Pos...@li...
>>>>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> --
>>>>>> Abbas
>>>>>> Architect
>>>>>> EnterpriseDB Corporation
>>>>>> The Enterprise PostgreSQL Company
>>>>>>
>>>>>> Phone: 92-334-5100153
>>>>>>
>>>>>> Website: www.enterprisedb.com
>>>>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/
>>>>>> Follow us on Twitter: http://www.twitter.com/enterprisedb
>>>>>>
>>>>>> This e-mail message (and any attachment) is intended for the use of
>>>>>> the individual or entity to whom it is addressed. This message
>>>>>> contains information from EnterpriseDB Corporation that may be
>>>>>> privileged, confidential, or exempt from disclosure under applicable
>>>>>> law. If you are not the intended recipient or authorized to receive
>>>>>> this for the intended recipient, any use, dissemination, distribution,
>>>>>> retention, archiving, or copying of this communication is strictly
>>>>>> prohibited. If you have received this e-mail in error, please notify
>>>>>> the sender immediately by reply e-mail and delete this message.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> --
>>>>> Abbas
>>>>> Architect
>>>>> EnterpriseDB Corporation
>>>>> The Enterprise PostgreSQL Company
>>>>>
>>>>> Phone: 92-334-5100153
>>>>>
>>>>> Website: www.enterprisedb.com
>>>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/
>>>>> Follow us on Twitter: http://www.twitter.com/enterprisedb
>>>>>
>>>>> This e-mail message (and any attachment) is intended for the use of
>>>>> the individual or entity to whom it is addressed. This message
>>>>> contains information from EnterpriseDB Corporation that may be
>>>>> privileged, confidential, or exempt from disclosure under applicable
>>>>> law. If you are not the intended recipient or authorized to receive
>>>>> this for the intended recipient, any use, dissemination, distribution,
>>>>> retention, archiving, or copying of this communication is strictly
>>>>> prohibited. If you have received this e-mail in error, please notify
>>>>> the sender immediately by reply e-mail and delete this message.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Wishes,
>>>> Ashutosh Bapat
>>>> EntepriseDB Corporation
>>>> The Enterprise Postgres Company
>>>>
>>>
>>>
>>>
>>> --
>>> --
>>> Abbas
>>> Architect
>>> EnterpriseDB Corporation
>>> The Enterprise PostgreSQL Company
>>>
>>> Phone: 92-334-5100153
>>>
>>> Website: www.enterprisedb.com
>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/
>>> Follow us on Twitter: http://www.twitter.com/enterprisedb
>>>
>>> This e-mail message (and any attachment) is intended for the use of
>>> the individual or entity to whom it is addressed. This message
>>> contains information from EnterpriseDB Corporation that may be
>>> privileged, confidential, or exempt from disclosure under applicable
>>> law. If you are not the intended recipient or authorized to receive
>>> this for the intended recipient, any use, dissemination, distribution,
>>> retention, archiving, or copying of this communication is strictly
>>> prohibited. If you have received this e-mail in error, please notify
>>> the sender immediately by reply e-mail and delete this message.
>>>
>>
>>
>>
>> --
>> Best Wishes,
>> Ashutosh Bapat
>> EntepriseDB Corporation
>> The Enterprise Postgres Company
>>
>
>
>
> --
> --
> Abbas
> Architect
> EnterpriseDB Corporation
> The Enterprise PostgreSQL Company
>
> Phone: 92-334-5100153
>
> Website: www.enterprisedb.com
> EnterpriseDB Blog: http://blogs.enterprisedb.com/
> Follow us on Twitter: http://www.twitter.com/enterprisedb
>
> This e-mail message (and any attachment) is intended for the use of
> the individual or entity to whom it is addressed. This message
> contains information from EnterpriseDB Corporation that may be
> privileged, confidential, or exempt from disclosure under applicable
> law. If you are not the intended recipient or authorized to receive
> this for the intended recipient, any use, dissemination, distribution,
> retention, archiving, or copying of this communication is strictly
> prohibited. If you have received this e-mail in error, please notify
> the sender immediately by reply e-mail and delete this message.
>



-- 
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Enterprise Postgres Company

Flat | Threaded

S	M	T	W	T	F	S
	1 (14)	2	3 (4)	4 (12)	5 (14)	6
7 (1)	8 (7)	9 (10)	10 (7)	11 (8)	12 (6)	13
14 (1)	15 (3)	16 (1)	17 (8)	18 (11)	19 (3)	20
21 (2)	22 (9)	23 (2)	24 (14)	25 (13)	26 (1)	27
28	29 (1)	30 (11)