Reindex data stream indices on different nodes #125171

masseyke · 2025-03-18T21:54:59Z

ReindexDataStreamIndexTransportAction currently always runs TransportReindexAction on the local node. TransportReindexAction allows for a pipeline to be passed in. When this happens, the pipeline is run on the local node, and then the output data is sent to whichever node has its shards for indexing. We have found this to be a bottleneck in performance testing. For example, we tested using a single data stream with 100 10-GB indices on a 10-node cluster. We configured data stream reindex to allow for 100 indices to be reindexed at once, and no throttling. We found that one node (the one where the reindex data stream task was running) to average 100% CPU use (with almost all of that being pipeline execution), and the other nodes to average ~15% (almost all indexing-related).

An initial change was to modify ReindexTransportAction to round-robin the nodes that it uses to handle slices of the data. In the example above, each index is divided into several slices, and each slice is sent to a different node. This led to the total reindex time for the 100-index data stream to be 1/3 of what it had been before.

This PR is a little less risky. It does the round-robin logic inside of data stream reindex. Pipelines for all documents of all slices for a single index will still be executed on a single node, but each index within a data stream is potentially sent to a different node. This will get us a pretty similar performance increase in most data stream reindex uses, without the risk of touching something that is in much wider use. And I don't think that this change prevents us from making the other change later on. They are both actually good for two different use cases. This change is good if you have many indices, and especially if you have many small indices. The change to TransportReindexAction would be good if you have one very large index.

The pipeline that is run in most cases is the one added in #121617.

elasticsearchmachine · 2025-03-18T21:55:23Z

Pinging @elastic/es-data-management (Team:Data Management)

elasticsearchmachine · 2025-03-18T21:55:24Z

Hi @masseyke, I've created a changelog YAML for you.

…masseyke/elasticsearch into round-robin-reindex-data-stream-indices

Copilot

Pull Request Overview

This PR implements a round-robin strategy for reindexing data stream indices by routing reindex requests to different ingest nodes, aiming to alleviate pipeline execution bottlenecks on a single node. Key changes include:

Updating ReindexDataStreamIndexTransportAction to use transportService.sendRequest to distribute work among ingest nodes.
Modifying tests in ReindexDataStreamIndexTransportActionTests to simulate cluster states with various ingest node configurations and to validate round-robin behavior.
Adding a changelog entry to document the enhancement.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File	Description
x-pack/plugin/migrate/src/main/java/org/elasticsearch/xpack/migrate/action/ReindexDataStreamIndexTransportAction.java	Updates reindex logic to round-robin requests among ingest nodes, replacing client.execute with transportService.sendRequest.
x-pack/plugin/migrate/src/test/java/org/elasticsearch/xpack/migrate/action/ReindexDataStreamIndexTransportActionTests.java	Updates tests to properly mock cluster states and verify that round-robin node assignment behaves as expected.
docs/changelog/125171.yaml	Adds a changelog entry summarizing the enhancement for reindexing data stream indices.

Comments suppressed due to low confidence (2)

x-pack/plugin/migrate/src/test/java/org/elasticsearch/xpack/migrate/action/ReindexDataStreamIndexTransportActionTests.java:270

[nitpick] Consider capturing and asserting the entire sequence of nodes returned by the round-robin logic (e.g. using getAllValues() from the captor) to more robustly verify the round-robin ordering.

for (int i = 0; i < ingestNodeCount - 1; i++) {

x-pack/plugin/migrate/src/main/java/org/elasticsearch/xpack/migrate/action/ReindexDataStreamIndexTransportAction.java:323

[nitpick] The error message 'No ingest nodes in cluster' could be further enhanced by suggesting possible steps or checks for cluster configuration to help users troubleshoot the issue.

if (ingestNodes.length == 0) {

lukewhiting · 2025-03-19T16:11:12Z

.../main/java/org/elasticsearch/xpack/migrate/action/ReindexDataStreamIndexTransportAction.java

+        if (ingestNodes.length == 0) {
+            listener.onFailure(new NoNodeAvailableException("No ingest nodes in cluster"));
+        } else {
+            DiscoveryNode ingestNode = ingestNodes[Math.floorMod(ingestNodeOffsetGenerator.incrementAndGet(), ingestNodes.length)];


Checking the API's for Math.floorMod and AtomicInteger, I think this line should correctly handle the case where the AtomicInteger overflows and the dividend becomes negative but is it worth adding an test for that case to future proof or are the current tests OK to handle it passively?

I just tried it out, and it is a minor bug -- Math.floorMod(Integer.MAX_VALUE, 17) is equal to Math.floorMod(Integer.MAX_VALUE + 1, 17). So if the test were to start with a value near Integer.MAX_VALUE it would fail (although I'm not too worried about the round-robin repeating a node twice in that very rare situation.

I changed the maximum initial value of that random number to be much smaller so that it never exceeds MAX_VALUE and the test will never fail. I think the actual behavior (choosing the same node twice once every 4.3 billion times) is harmless, and not worth the complexity of fixing.

lukewhiting

One NABD comment but otherwise looks like a nice speed improvement :-) 👍🏻

…masseyke/elasticsearch into round-robin-reindex-data-stream-indices

(cherry picked from commit 24132d3)

masseyke · 2025-03-20T17:14:20Z

💚 All backports created successfully

Status	Branch	Result
✅	8.x
✅	9.0
✅	8.18

Questions ?

Please refer to the Backport tool documentation

(cherry picked from commit 24132d3)

Reindex data stream indices on different nodes

e2fe506

masseyke added >enhancement :Data Management/Data streams Data streams and their lifecycles v8.18.1 v8.19.0 v9.0.1 v9.1.0 labels Mar 18, 2025

elasticsearchmachine added the Team:Data Management Meta label for data/management team label Mar 18, 2025

Update docs/changelog/125171.yaml

03c40f4

masseyke added 4 commits March 18, 2025 17:04

Adding a comment

8eb2144

Merge branch 'round-robin-reindex-data-stream-indices' of github.com:…

53aed34

…masseyke/elasticsearch into round-robin-reindex-data-stream-indices

fixing and adding unit tests

ea9eef4

Merge branch 'main' into round-robin-reindex-data-stream-indices

987c0ca

lukewhiting requested a review from Copilot March 19, 2025 15:58

Copilot AI reviewed Mar 19, 2025

View reviewed changes

lukewhiting reviewed Mar 19, 2025

View reviewed changes

lukewhiting approved these changes Mar 19, 2025

View reviewed changes

masseyke mentioned this pull request Mar 19, 2025

Sending slice requests to different nodes in BulkByScrollParallelizationHelper to improve performance #125238

Closed

masseyke added 2 commits March 19, 2025 14:01

preventing a possible unit test failure

b5e6b0c

Merge branch 'round-robin-reindex-data-stream-indices' of github.com:…

42941a7

…masseyke/elasticsearch into round-robin-reindex-data-stream-indices

masseyke merged commit 24132d3 into elastic:main Mar 20, 2025
17 checks passed

masseyke deleted the round-robin-reindex-data-stream-indices branch March 20, 2025 12:50

This was referenced Mar 20, 2025

[8.x] Reindex data stream indices on different nodes (#125171) #125333

Merged

[9.0] Reindex data stream indices on different nodes (#125171) #125334

Merged

masseyke added a commit to masseyke/elasticsearch that referenced this pull request Mar 20, 2025

Reindex data stream indices on different nodes (elastic#125171)

eb6cd50

(cherry picked from commit 24132d3)

masseyke added a commit to masseyke/elasticsearch that referenced this pull request Mar 20, 2025

Reindex data stream indices on different nodes (elastic#125171)

fd3fdfb

(cherry picked from commit 24132d3)

masseyke mentioned this pull request Mar 20, 2025

[8.18] Reindex data stream indices on different nodes (#125171) #125335

Merged

masseyke added a commit to masseyke/elasticsearch that referenced this pull request Mar 20, 2025

Reindex data stream indices on different nodes (elastic#125171)

4811d06

(cherry picked from commit 24132d3)

elasticsearchmachine pushed a commit that referenced this pull request Mar 20, 2025

Reindex data stream indices on different nodes (#125171) (#125335)

c427bd7

(cherry picked from commit 24132d3)

elasticsearchmachine pushed a commit that referenced this pull request Mar 20, 2025

Reindex data stream indices on different nodes (#125171) (#125334)

c3fa23a

(cherry picked from commit 24132d3)

elasticsearchmachine pushed a commit that referenced this pull request Mar 20, 2025

Reindex data stream indices on different nodes (#125171) (#125333)

e946f24

(cherry picked from commit 24132d3)

masseyke mentioned this pull request Mar 21, 2025

Throttling reindex requests per node from reindex data stream api #125353

Closed

afoucret pushed a commit to afoucret/elasticsearch that referenced this pull request Mar 21, 2025

Reindex data stream indices on different nodes (elastic#125171)

3fd9c2c

smalyshev pushed a commit to smalyshev/elasticsearch that referenced this pull request Mar 21, 2025

Reindex data stream indices on different nodes (elastic#125171)

0ae33ec

omricohenn pushed a commit to omricohenn/elasticsearch that referenced this pull request Mar 28, 2025

Reindex data stream indices on different nodes (elastic#125171)

838023a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reindex data stream indices on different nodes #125171

Reindex data stream indices on different nodes #125171

Uh oh!

masseyke commented Mar 18, 2025 •

edited

Loading

Uh oh!

elasticsearchmachine commented Mar 18, 2025

Uh oh!

elasticsearchmachine commented Mar 18, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

lukewhiting Mar 19, 2025 •

edited

Loading

Uh oh!

masseyke Mar 19, 2025

Uh oh!

masseyke Mar 19, 2025

Uh oh!

lukewhiting left a comment

Uh oh!

Uh oh!

masseyke commented Mar 20, 2025

Uh oh!

Uh oh!

Reindex data stream indices on different nodes #125171

Reindex data stream indices on different nodes #125171

Uh oh!

Conversation

masseyke commented Mar 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Mar 18, 2025

Uh oh!

elasticsearchmachine commented Mar 18, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

lukewhiting Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

masseyke Mar 19, 2025

Choose a reason for hiding this comment

Uh oh!

masseyke Mar 19, 2025

Choose a reason for hiding this comment

Uh oh!

lukewhiting left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

masseyke commented Mar 20, 2025

💚 All backports created successfully

Questions ?

Uh oh!

Uh oh!

masseyke commented Mar 18, 2025 •

edited

Loading

lukewhiting Mar 19, 2025 •

edited

Loading