Skip to content

HDDS-9354. LegacyReplicationManager: Unhealthy replicas of a sufficiently replicated container can block decommissioning #5385

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 6, 2023

Conversation

siddhantsangwan
Copy link
Contributor

What changes were proposed in this pull request?

Mix of quasi-closed and unhealthy replicas blocks decommission even if sufficiently replicated.
a. Caused when only some of the replicas hit the error during write.
b. Can be fixed by removing this check:

if (!replicaSet.isHealthy()) {
          if (LOG.isDebugEnabled()) {
            unhealthyIDs.add(cid);
          }
          if (unhealthy < CONTAINER_DETAILS_LOGGING_LIMIT

However, simply removing that check is not a complete solution. We need to try and preserve any UNHEALTHY replicas that have the greatest Sequence ID. https://issues.apache.org/jira/browse/HDDS-9321 takes care of the Legacy Replication Manager side of things to preserve such UNHEALTHY replicas. This jira focuses on the decommissioning side.

Changes:

  1. Implemented isSufficientlyReplicatedForOffline in LegacyRatisContainerReplicaCount. The implementation checks getVulnerableUnhealthyReplicas for any UNHEALTHY replicas that need to saved.
  2. Implemented isHealthyEnoughForOffline in LegacyRatisContainerReplicaCount. This method does not check for UNHEALTHY replicas like isHealthy().
  3. DatanodeAdminMonitorImpl now uses the isHealthyEnoughForOffline API instead of isHealthy() so this check does not block decommissioning. Of course, decommissioning still remains blocked until isSufficientlyReplicatedForOffline is satisfied and vulnerable UNHEALTHY replicas have been replicated.

This PR only deals with the LegacyReplicationManager + decommissioning flow. I've added a defensive check for "hdds.scm.replication.enable.legacy" in the decommissioning code.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-9354

How was this patch tested?

Added unit test.

…ntly replicated container can block decommissioning
.filter(r -> r.getDatanodeDetails().getPersistedOpState() == IN_SERVICE)
.filter(r -> r.getState() !=
ContainerReplicaProto.State.UNHEALTHY)
.allMatch(r -> ReplicationManager.compareState(
Copy link
Contributor

@sodonnel sodonnel Oct 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sure about this. What I think method is trying to say is:

Return true if there are some in service nodes that don't have an unhealthy replica?

So if we have:

Container: Closed
IN_SERVICE: Unhealthy
DECOMMISSIONING: Unhealthy

What do we expect here? The method will return true as it stands I think.

But this would return false:

Container: Closed
IN_SERVICE: Quasi_Closed
DECOMMISSIONING: Unhealthy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is a variation of isHealthy, whose purpose is to check if all replicas on in-service nodes are in the same state as the container:

  /**
   * Returns true if the container is healthy, meaning all replica which are not
   * in a decommission or maintenance state are in the same state as the
   * container and in QUASI_CLOSED or in CLOSED state.
   *
   * @return true if the container is healthy, false otherwise
   */
  default boolean isHealthy() {
    HddsProtos.LifeCycleState containerState = getContainer().getState();
    return (containerState == HddsProtos.LifeCycleState.CLOSED
        || containerState == HddsProtos.LifeCycleState.QUASI_CLOSED)
        && getReplicas().stream()
        .filter(r -> r.getDatanodeDetails().getPersistedOpState() == IN_SERVICE)
        .allMatch(r -> LegacyReplicationManager.compareState(
            containerState, r.getState()));

  }

isHealthyEnoughForOffline is intended to do the same, except it shouldn't let UNHEALTHY replicas block decommissioning.

That said, there are situations when we don't want to let UNHEALTHY replicas get decommissioned. For example, when we have a closed/quasi-closed container with only UNHEALTHY replicas. Or when there's a quasi-closed container with UNHEALTHY replicas having the greatest seq id. However this method isn't concerned with that. These cases should be handled by isSufficientlyReplicatedForOffline, whose job is to check if we have sufficient number of replicas.

So, I think for this case:

Container: Closed
IN_SERVICE: Unhealthy
DECOMMISSIONING: Unhealthy

It's ok for this method to return true. However isSufficientlyReplicatedForOffline should handle this if we don't have enough healthy replicas.

For:

Container: Closed
IN_SERVICE: Quasi_Closed
DECOMMISSIONING: Unhealthy

I think the method should return false, because the quasi-closed replica needs to be force closed first.

Copy link
Contributor Author

@siddhantsangwan siddhantsangwan Oct 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, there are situations when we don't want to let UNHEALTHY replicas get decommissioned. For example, when we have a closed/quasi-closed container with only UNHEALTHY replicas.

What I mean here is that if we have a closed/quasi-container with only UNHEALTHY replicas, and not enough of them. Actually, I was just looking at the decommissioning flow for this situation. I think currently if we have only UNHEALTHY replicas and enough of them, decommissioning will still stay blocked because isSufficientlyReplicated doesn't count UNHEALTHY replicas unless the considerUnhealthy flag is true. This is the behaviour regardless of this change, and this change doesn't affect it. I think we'll have to look into this situation separately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - I think this makes sense as it is then.

I think currently if we have only UNHEALTHY replicas and enough of them, decommissioning will still stay blocked because isSufficientlyReplicated doesn't count UNHEALTHY replicas unless the considerUnhealthy flag is true.

For the "new" RM, I think we are going to need to come up with the reconciliation job to fix the unhealthy containers, as its becoming impossible to reason about all these combination of unhealthy states!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the "new" RM, I think we are going to need to come up with the reconciliation job to fix the unhealthy containers, as its becoming impossible to reason about all these combination of unhealthy states!

I agree!

@@ -37,7 +38,8 @@ public interface ContainerReplicaCount {

boolean isSufficientlyReplicated();

boolean isSufficientlyReplicatedForOffline(DatanodeDetails datanode);
boolean isSufficientlyReplicatedForOffline(DatanodeDetails datanode,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should add some Java doc here to explain what this method is for, or down at the implementation level if the description is going to be different for Legacy, EC and Ratis Counts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll add javadoc once we've concluded our discussion above about isHealthyEnoughForOffline and isSufficientlyReplicatedForOffline.

@sodonnel
Copy link
Contributor

sodonnel commented Oct 3, 2023

Change looks largely good. I still have a query about that same method as before, and also a suggestion to add a bit of Java docs for the new methods on the replicaCount interface.

Copy link
Contributor

@sodonnel sodonnel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@siddhantsangwan siddhantsangwan merged commit 38b67b4 into apache:master Oct 6, 2023
@siddhantsangwan
Copy link
Contributor Author

Merged. Thanks @sodonnel for the quick reviews.

jojochuang pushed a commit to jojochuang/ozone that referenced this pull request Feb 1, 2024
…f a sufficiently replicated container can block decommissioning (apache#5385)

(cherry picked from commit 38b67b4)
Change-Id: Ie3a3d47e8bb85afce9d406f201ebf88cd7c54f6e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants