-
Notifications
You must be signed in to change notification settings - Fork 537
HDDS-9354. LegacyReplicationManager: Unhealthy replicas of a sufficiently replicated container can block decommissioning #5385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ntly replicated container can block decommissioning
.filter(r -> r.getDatanodeDetails().getPersistedOpState() == IN_SERVICE) | ||
.filter(r -> r.getState() != | ||
ContainerReplicaProto.State.UNHEALTHY) | ||
.allMatch(r -> ReplicationManager.compareState( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not sure about this. What I think method is trying to say is:
Return true if there are some in service nodes that don't have an unhealthy replica?
So if we have:
Container: Closed
IN_SERVICE: Unhealthy
DECOMMISSIONING: Unhealthy
What do we expect here? The method will return true as it stands I think.
But this would return false:
Container: Closed
IN_SERVICE: Quasi_Closed
DECOMMISSIONING: Unhealthy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is a variation of isHealthy
, whose purpose is to check if all replicas on in-service nodes are in the same state as the container:
/**
* Returns true if the container is healthy, meaning all replica which are not
* in a decommission or maintenance state are in the same state as the
* container and in QUASI_CLOSED or in CLOSED state.
*
* @return true if the container is healthy, false otherwise
*/
default boolean isHealthy() {
HddsProtos.LifeCycleState containerState = getContainer().getState();
return (containerState == HddsProtos.LifeCycleState.CLOSED
|| containerState == HddsProtos.LifeCycleState.QUASI_CLOSED)
&& getReplicas().stream()
.filter(r -> r.getDatanodeDetails().getPersistedOpState() == IN_SERVICE)
.allMatch(r -> LegacyReplicationManager.compareState(
containerState, r.getState()));
}
isHealthyEnoughForOffline
is intended to do the same, except it shouldn't let UNHEALTHY
replicas block decommissioning.
That said, there are situations when we don't want to let UNHEALTHY replicas get decommissioned. For example, when we have a closed/quasi-closed container with only UNHEALTHY replicas. Or when there's a quasi-closed container with UNHEALTHY replicas having the greatest seq id. However this method isn't concerned with that. These cases should be handled by isSufficientlyReplicatedForOffline
, whose job is to check if we have sufficient number of replicas.
So, I think for this case:
Container: Closed
IN_SERVICE: Unhealthy
DECOMMISSIONING: Unhealthy
It's ok for this method to return true. However isSufficientlyReplicatedForOffline
should handle this if we don't have enough healthy replicas.
For:
Container: Closed
IN_SERVICE: Quasi_Closed
DECOMMISSIONING: Unhealthy
I think the method should return false, because the quasi-closed replica needs to be force closed first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, there are situations when we don't want to let UNHEALTHY replicas get decommissioned. For example, when we have a closed/quasi-closed container with only UNHEALTHY replicas.
What I mean here is that if we have a closed/quasi-container with only UNHEALTHY replicas, and not enough of them. Actually, I was just looking at the decommissioning flow for this situation. I think currently if we have only UNHEALTHY replicas and enough of them, decommissioning will still stay blocked because isSufficientlyReplicated
doesn't count UNHEALTHY replicas unless the considerUnhealthy
flag is true. This is the behaviour regardless of this change, and this change doesn't affect it. I think we'll have to look into this situation separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - I think this makes sense as it is then.
I think currently if we have only UNHEALTHY replicas and enough of them, decommissioning will still stay blocked because isSufficientlyReplicated doesn't count UNHEALTHY replicas unless the considerUnhealthy flag is true.
For the "new" RM, I think we are going to need to come up with the reconciliation job to fix the unhealthy containers, as its becoming impossible to reason about all these combination of unhealthy states!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the "new" RM, I think we are going to need to come up with the reconciliation job to fix the unhealthy containers, as its becoming impossible to reason about all these combination of unhealthy states!
I agree!
@@ -37,7 +38,8 @@ public interface ContainerReplicaCount { | |||
|
|||
boolean isSufficientlyReplicated(); | |||
|
|||
boolean isSufficientlyReplicatedForOffline(DatanodeDetails datanode); | |||
boolean isSufficientlyReplicatedForOffline(DatanodeDetails datanode, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should add some Java doc here to explain what this method is for, or down at the implementation level if the description is going to be different for Legacy, EC and Ratis Counts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll add javadoc once we've concluded our discussion above about isHealthyEnoughForOffline
and isSufficientlyReplicatedForOffline
.
Change looks largely good. I still have a query about that same method as before, and also a suggestion to add a bit of Java docs for the new methods on the replicaCount interface. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Merged. Thanks @sodonnel for the quick reviews. |
…f a sufficiently replicated container can block decommissioning (apache#5385) (cherry picked from commit 38b67b4) Change-Id: Ie3a3d47e8bb85afce9d406f201ebf88cd7c54f6e
What changes were proposed in this pull request?
Mix of quasi-closed and unhealthy replicas blocks decommission even if sufficiently replicated.
a. Caused when only some of the replicas hit the error during write.
b. Can be fixed by removing this check:
However, simply removing that check is not a complete solution. We need to try and preserve any UNHEALTHY replicas that have the greatest Sequence ID. https://issues.apache.org/jira/browse/HDDS-9321 takes care of the Legacy Replication Manager side of things to preserve such UNHEALTHY replicas. This jira focuses on the decommissioning side.
Changes:
isSufficientlyReplicatedForOffline
inLegacyRatisContainerReplicaCount
. The implementation checksgetVulnerableUnhealthyReplicas
for any UNHEALTHY replicas that need to saved.isHealthyEnoughForOffline
inLegacyRatisContainerReplicaCount
. This method does not check for UNHEALTHY replicas likeisHealthy()
.DatanodeAdminMonitorImpl
now uses theisHealthyEnoughForOffline
API instead ofisHealthy()
so this check does not block decommissioning. Of course, decommissioning still remains blocked untilisSufficientlyReplicatedForOffline
is satisfied and vulnerable UNHEALTHY replicas have been replicated.This PR only deals with the LegacyReplicationManager + decommissioning flow. I've added a defensive check for "hdds.scm.replication.enable.legacy" in the decommissioning code.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-9354
How was this patch tested?
Added unit test.