From: Ashutosh B. <ash...@en...> - 2013-03-25 11:13:48
|
Hi All, I am working on using remote sorting for merge joins. The idea is while using merge join at the coordinator, get the data sorted from the datanodes; for replicated relations, we can get all the rows sorted and for distributed tables we have to get sorted runs which can be merged at the coordinator. For merge join the sorted inner relation needs to be randomly accessible. For replicated relations this can be achieved by materialising the result. But for distributed relations, we do not materialise the sorted result at coordinator but compute the sorted result by merging the sorted results from individual nodes on the fly. For distributed relations, the connection to the datanodes themselves are used as logical tapes (which provide the sorted runs). The final result is computed on the fly by choosing the smallest or greatest row (as required) from the connections. For a Sort node the materialised result can reside in memory (if it fits there) or on one of the logical tapes used for merge sort. So, in order to provide random access to the sorted result, we need to materialise the result either in the memory or on the logical tape. In-memory materialisation is not easily possible since we have already resorted for tape based sort, in case of distributed relations and to materialise the result on tape, there is no logical tape available in current algorithm. To make it work, there are following possible ways 1. When random access is required, materialise the sorted runs from individual nodes onto tapes (one tape for each node) and then merge them on one extra tape, which can be used for materialisation. 2. Use a mix of connections and logical tape in the same tape set. Merge the sorted runs from connections on a logical tape in the same logical tape set. While the second one looks attractive from performance perspective (it saves writing and reading from the tape), it would make the merge code ugly by using mixed tapes. The read calls for connection and logical tape are different and we will need both on the logical tape where the final result is materialized. So, I am thinking of going with 1, in fact, to have same code to handle remote sort, use 1 in all cases (whether or not materialization is required). Had original authors of remote sort code thought about this materialization? Anything they can share on this topic? Any comment? -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |