[Postgres-xc-developers] Using remote sorting for merge-join

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi All,
I am working on using remote sorting for merge joins. The idea is while
using merge join at the coordinator, get the data sorted from the
datanodes; for replicated relations, we can get all the rows sorted and for
distributed tables we have to get sorted runs which can be merged at the
coordinator. For merge join the sorted inner relation needs to be randomly
accessible. For replicated relations this can be achieved by materialising
the result. But for distributed relations, we do not materialise the sorted
result at coordinator but compute the sorted result by merging the sorted
results from individual nodes on the fly. For distributed relations, the
connection to the datanodes themselves are used as logical tapes (which
provide the sorted runs). The final result is computed on the fly by
choosing the smallest or greatest row (as required) from the connections.

For a Sort node the materialised result can reside in memory (if it fits
there) or on one of the logical tapes used for merge sort. So, in order to
provide random access to the sorted result, we need to materialise the
result either in the memory or on the logical tape. In-memory
materialisation is not easily possible since we have already resorted for
tape based sort, in case of distributed relations and to materialise the
result on tape, there is no logical tape available in current algorithm. To
make it work, there are following possible ways

1. When random access is required, materialise the sorted runs from
individual nodes onto tapes (one tape for each node) and then merge them on
one extra tape, which can be used for materialisation.
2. Use a mix of connections and logical tape in the same tape set. Merge
the sorted runs from connections on a logical tape in the same logical tape
set.

While the second one looks attractive from performance perspective (it
saves writing and reading from the tape), it would make the merge code ugly
by using mixed tapes. The read calls for connection and logical tape are
different and we will need both on the logical tape where the final result
is materialized. So, I am thinking of going with 1, in fact, to have same
code to handle remote sort, use 1 in all cases (whether or not
materialization is required).

Had original authors of remote sort code thought about this
materialization? Anything they can share on this topic?
Any comment?
-- 
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Enterprise Postgres Company