[HDDS-13067] Container Balancer delete commands are sent with an expiration time in the past - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: In Progress
Priority: Major
Resolution: Unresolved
Affects Version/s: 1.4.1
Fix Version/s: None
Component/s: SCM
Labels:
- pull-request-available

Target Version/s:

2.1.0

Description

Problem

This is the method that sends the delete command in MoveManager:

  private void sendDeleteCommand(
      final ContainerInfo containerInfo, final DatanodeDetails datanode)
      throws ContainerReplicaNotFoundException, ContainerNotFoundException,
      NotLeaderException {
    int replicaIndex = getContainerReplicaIndex(
        containerInfo.containerID(), datanode);
    long deleteTimeout = moveTimeout - replicationTimeout;
    long now = clock.millis();
    replicationManager.sendDeleteCommand(
        containerInfo, replicaIndex, datanode, true, now + deleteTimeout);
  }

It calculates deleteTimeout as moveTimeout - replicationTimeout, and then sends the delete command with an SCM expiration timestamp of current time + deleteTimeout. This is wrong, the delete expiration timestamp should actually be "The time at which the move was started + moveTimeout."

This diagram can help with visualisation, the key is that move = replicate + delete.

/A/------------------------------------------------/B/-----------/C/

A = move start time
B = move start time + replication timeout
C = move start time + move timeout

The time duration that replicate command gets is replicationTimeout, and the time duration that the total move gets is moveTimeout.
So, the timestamp at which replicate command should expire is moveStart + replicationTimeout (which is correct in the code). And the time at which the delete should expire is moveStart + moveTimeout (this correction needs to be done in the code).

This bug is causing the delete expiration timestamp to be in the past (in the Datanode) because Replication Manager (via which the command is actually sent) further reduces the Datanode side expiration timestamp by event.timeout.datanode.offset. So whenever moveTimeout - replicationTimeout < event.timeout.datanode.offset, the expiration time in the DN is in the past.

Example and Repro

For example, consider the following configs:
hdds.container.balancer.move.replication.timeout=50m, hdds.container.balancer.move.timeout=55m,
hdds.scm.replication.event.timeout.datanode.offset=6m.

MoveManager#sendDeleteCommand calls ReplicationManager#sendDeleteCommand with SCM expiration timestamp of now + moveTimeout - moveReplicationTimeout, which is now + 55 - 50, which is now + 5 minutes.

The Replication Manager method further calls sendDatanodeCommand, which calculates the Datanode expiration timestamp as

datanodeDeadline =
        scmDeadlineEpochMs - rmConf.getDatanodeTimeoutOffset()

which translates to now + 5 minutes - 6 minutes, which is in the past.

We need to further ensure the balancer configurations are not allowed to be configured like this, which can be handled in another Jira - https://issues.apache.org/jira/browse/HDDS-13068.

Solution

For this jira, a simple fix is to keep the time when the move is scheduled in MoveManager#pendingMoves map, then use that time to calculate the delete timestamp when sending the delete command.

Attachments

Issue Links

links to

GitHub Pull Request #8491

Activity

People

Assignee:: Tejaskriya

Reporter:: Siddhant Sangwan

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 19/May/25 07:20

Updated:: 1 week ago 12:28