HDDS-9354. LegacyReplicationManager: Unhealthy replicas of a sufficiently replicated container can block decommissioning #5385

siddhantsangwan · 2023-10-03T10:26:50Z

What changes were proposed in this pull request?

Mix of quasi-closed and unhealthy replicas blocks decommission even if sufficiently replicated.
a. Caused when only some of the replicas hit the error during write.
b. Can be fixed by removing this check:

if (!replicaSet.isHealthy()) {
          if (LOG.isDebugEnabled()) {
            unhealthyIDs.add(cid);
          }
          if (unhealthy < CONTAINER_DETAILS_LOGGING_LIMIT

However, simply removing that check is not a complete solution. We need to try and preserve any UNHEALTHY replicas that have the greatest Sequence ID. https://issues.apache.org/jira/browse/HDDS-9321 takes care of the Legacy Replication Manager side of things to preserve such UNHEALTHY replicas. This jira focuses on the decommissioning side.

Changes:

Implemented isSufficientlyReplicatedForOffline in LegacyRatisContainerReplicaCount. The implementation checks getVulnerableUnhealthyReplicas for any UNHEALTHY replicas that need to saved.
Implemented isHealthyEnoughForOffline in LegacyRatisContainerReplicaCount. This method does not check for UNHEALTHY replicas like isHealthy().
DatanodeAdminMonitorImpl now uses the isHealthyEnoughForOffline API instead of isHealthy() so this check does not block decommissioning. Of course, decommissioning still remains blocked until isSufficientlyReplicatedForOffline is satisfied and vulnerable UNHEALTHY replicas have been replicated.

This PR only deals with the LegacyReplicationManager + decommissioning flow. I've added a defensive check for "hdds.scm.replication.enable.legacy" in the decommissioning code.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-9354

How was this patch tested?

Added unit test.

…ntly replicated container can block decommissioning

sodonnel · 2023-10-03T14:28:56Z

.../java/org/apache/hadoop/hdds/scm/container/replication/LegacyRatisContainerReplicaCount.java

+        .filter(r -> r.getDatanodeDetails().getPersistedOpState() == IN_SERVICE)
+        .filter(r -> r.getState() !=
+            ContainerReplicaProto.State.UNHEALTHY)
+        .allMatch(r -> ReplicationManager.compareState(


I'm still not sure about this. What I think method is trying to say is:

Return true if there are some in service nodes that don't have an unhealthy replica?

So if we have:

Container: Closed IN_SERVICE: Unhealthy DECOMMISSIONING: Unhealthy

What do we expect here? The method will return true as it stands I think.

But this would return false:

Container: Closed IN_SERVICE: Quasi_Closed DECOMMISSIONING: Unhealthy

This method is a variation of isHealthy, whose purpose is to check if all replicas on in-service nodes are in the same state as the container:

/** * Returns true if the container is healthy, meaning all replica which are not * in a decommission or maintenance state are in the same state as the * container and in QUASI_CLOSED or in CLOSED state. * * @return true if the container is healthy, false otherwise */ default boolean isHealthy() { HddsProtos.LifeCycleState containerState = getContainer().getState(); return (containerState == HddsProtos.LifeCycleState.CLOSED || containerState == HddsProtos.LifeCycleState.QUASI_CLOSED) && getReplicas().stream() .filter(r -> r.getDatanodeDetails().getPersistedOpState() == IN_SERVICE) .allMatch(r -> LegacyReplicationManager.compareState( containerState, r.getState())); }

isHealthyEnoughForOffline is intended to do the same, except it shouldn't let UNHEALTHY replicas block decommissioning.

That said, there are situations when we don't want to let UNHEALTHY replicas get decommissioned. For example, when we have a closed/quasi-closed container with only UNHEALTHY replicas. Or when there's a quasi-closed container with UNHEALTHY replicas having the greatest seq id. However this method isn't concerned with that. These cases should be handled by isSufficientlyReplicatedForOffline, whose job is to check if we have sufficient number of replicas.

So, I think for this case:

Container: Closed IN_SERVICE: Unhealthy DECOMMISSIONING: Unhealthy

It's ok for this method to return true. However isSufficientlyReplicatedForOffline should handle this if we don't have enough healthy replicas.

For:

Container: Closed IN_SERVICE: Quasi_Closed DECOMMISSIONING: Unhealthy

I think the method should return false, because the quasi-closed replica needs to be force closed first.

That said, there are situations when we don't want to let UNHEALTHY replicas get decommissioned. For example, when we have a closed/quasi-closed container with only UNHEALTHY replicas.

What I mean here is that if we have a closed/quasi-container with only UNHEALTHY replicas, and not enough of them. Actually, I was just looking at the decommissioning flow for this situation. I think currently if we have only UNHEALTHY replicas and enough of them, decommissioning will still stay blocked because isSufficientlyReplicated doesn't count UNHEALTHY replicas unless the considerUnhealthy flag is true. This is the behaviour regardless of this change, and this change doesn't affect it. I think we'll have to look into this situation separately.

OK - I think this makes sense as it is then.

I think currently if we have only UNHEALTHY replicas and enough of them, decommissioning will still stay blocked because isSufficientlyReplicated doesn't count UNHEALTHY replicas unless the considerUnhealthy flag is true.

For the "new" RM, I think we are going to need to come up with the reconciliation job to fix the unhealthy containers, as its becoming impossible to reason about all these combination of unhealthy states!

For the "new" RM, I think we are going to need to come up with the reconciliation job to fix the unhealthy containers, as its becoming impossible to reason about all these combination of unhealthy states!

I agree!

sodonnel · 2023-10-03T14:33:42Z

...cm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ContainerReplicaCount.java

@@ -37,7 +38,8 @@ public interface ContainerReplicaCount {

  boolean isSufficientlyReplicated();

-  boolean isSufficientlyReplicatedForOffline(DatanodeDetails datanode);
+  boolean isSufficientlyReplicatedForOffline(DatanodeDetails datanode,


Maybe we should add some Java doc here to explain what this method is for, or down at the implementation level if the description is going to be different for Legacy, EC and Ratis Counts?

Sure, I'll add javadoc once we've concluded our discussion above about isHealthyEnoughForOffline and isSufficientlyReplicatedForOffline.

sodonnel · 2023-10-03T14:35:45Z

Change looks largely good. I still have a query about that same method as before, and also a suggestion to add a bit of Java docs for the new methods on the replicaCount interface.

sodonnel

LGTM

siddhantsangwan · 2023-10-06T04:27:04Z

Merged. Thanks @sodonnel for the quick reviews.

…f a sufficiently replicated container can block decommissioning (apache#5385) (cherry picked from commit 38b67b4) Change-Id: Ie3a3d47e8bb85afce9d406f201ebf88cd7c54f6e

HDDS-9354. LegacyReplicationManager: Unhealthy replicas of a sufficie…

316723d

…ntly replicated container can block decommissioning

siddhantsangwan mentioned this pull request Oct 3, 2023

HDDS-9354. LegacyReplicationManager: Unhealthy replicas of a sufficiently replicated container can block decommissioning #5369

Closed

siddhantsangwan requested review from sodonnel and errose28 October 3, 2023 10:31

sodonnel reviewed Oct 3, 2023

View reviewed changes

add javadoc

fbafdbd

sodonnel approved these changes Oct 4, 2023

View reviewed changes

siddhantsangwan merged commit 38b67b4 into apache:master Oct 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HDDS-9354. LegacyReplicationManager: Unhealthy replicas of a sufficiently replicated container can block decommissioning #5385

HDDS-9354. LegacyReplicationManager: Unhealthy replicas of a sufficiently replicated container can block decommissioning #5385

Uh oh!

siddhantsangwan commented Oct 3, 2023

Uh oh!

sodonnel Oct 3, 2023 •

edited

Loading

Uh oh!

siddhantsangwan Oct 4, 2023

Uh oh!

siddhantsangwan Oct 4, 2023 •

edited

Loading

Uh oh!

sodonnel Oct 4, 2023

Uh oh!

siddhantsangwan Oct 4, 2023

Uh oh!

sodonnel Oct 3, 2023

Uh oh!

siddhantsangwan Oct 4, 2023

Uh oh!

sodonnel commented Oct 3, 2023

Uh oh!

sodonnel left a comment

Uh oh!

siddhantsangwan commented Oct 6, 2023

Uh oh!

Uh oh!

HDDS-9354. LegacyReplicationManager: Unhealthy replicas of a sufficiently replicated container can block decommissioning #5385

HDDS-9354. LegacyReplicationManager: Unhealthy replicas of a sufficiently replicated container can block decommissioning #5385

Uh oh!

Conversation

siddhantsangwan commented Oct 3, 2023

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

sodonnel Oct 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

siddhantsangwan Oct 4, 2023

Choose a reason for hiding this comment

Uh oh!

siddhantsangwan Oct 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sodonnel Oct 4, 2023

Choose a reason for hiding this comment

Uh oh!

siddhantsangwan Oct 4, 2023

Choose a reason for hiding this comment

Uh oh!

sodonnel Oct 3, 2023

Choose a reason for hiding this comment

Uh oh!

siddhantsangwan Oct 4, 2023

Choose a reason for hiding this comment

Uh oh!

sodonnel commented Oct 3, 2023

Uh oh!

sodonnel left a comment

Choose a reason for hiding this comment

Uh oh!

siddhantsangwan commented Oct 6, 2023

Uh oh!

Uh oh!

sodonnel Oct 3, 2023 •

edited

Loading

siddhantsangwan Oct 4, 2023 •

edited

Loading