Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-13067

Container Balancer delete commands are sent with an expiration time in the past

    XMLWordPrintableJSON

Details

    • Bug
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • 1.4.1
    • None
    • SCM

    Description

      Problem

      This is the method that sends the delete command in MoveManager:

        private void sendDeleteCommand(
            final ContainerInfo containerInfo, final DatanodeDetails datanode)
            throws ContainerReplicaNotFoundException, ContainerNotFoundException,
            NotLeaderException {
          int replicaIndex = getContainerReplicaIndex(
              containerInfo.containerID(), datanode);
          long deleteTimeout = moveTimeout - replicationTimeout;
          long now = clock.millis();
          replicationManager.sendDeleteCommand(
              containerInfo, replicaIndex, datanode, true, now + deleteTimeout);
        }
      

      It calculates deleteTimeout as moveTimeout - replicationTimeout, and then sends the delete command with an SCM expiration timestamp of current time + deleteTimeout. This is wrong, the delete expiration timestamp should actually be "The time at which the move was started + moveTimeout."

      This diagram can help with visualisation, the key is that move = replicate + delete.

      /A/------------------------------------------------/B/-----------/C/
      

      A = move start time
      B = move start time + replication timeout
      C = move start time + move timeout

      The time duration that replicate command gets is replicationTimeout, and the time duration that the total move gets is moveTimeout.
      So, the timestamp at which replicate command should expire is moveStart + replicationTimeout (which is correct in the code). And the time at which the delete should expire is moveStart + moveTimeout (this correction needs to be done in the code).

      This bug is causing the delete expiration timestamp to be in the past (in the Datanode) because Replication Manager (via which the command is actually sent) further reduces the Datanode side expiration timestamp by event.timeout.datanode.offset. So whenever moveTimeout - replicationTimeout < event.timeout.datanode.offset, the expiration time in the DN is in the past.

      Example and Repro

      For example, consider the following configs:
      hdds.container.balancer.move.replication.timeout=50m, hdds.container.balancer.move.timeout=55m,
      hdds.scm.replication.event.timeout.datanode.offset=6m.

      MoveManager#sendDeleteCommand calls ReplicationManager#sendDeleteCommand with SCM expiration timestamp of now + moveTimeout - moveReplicationTimeout, which is now + 55 - 50, which is now + 5 minutes.

      The Replication Manager method further calls sendDatanodeCommand, which calculates the Datanode expiration timestamp as

      datanodeDeadline =
              scmDeadlineEpochMs - rmConf.getDatanodeTimeoutOffset()
      

      which translates to now + 5 minutes - 6 minutes, which is in the past.

      We need to further ensure the balancer configurations are not allowed to be configured like this, which can be handled in another Jira - https://issues.apache.org/jira/browse/HDDS-13068.

      Solution

      For this jira, a simple fix is to keep the time when the move is scheduled in MoveManager#pendingMoves map, then use that time to calculate the delete timestamp when sending the delete command.

      Attachments

        Issue Links

          Activity

            People

              tejaskriya Tejaskriya
              siddhant Siddhant Sangwan
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: