HDDS-9151. Close EC Pipeline when container transitions to closing #5191

sodonnel · 2023-08-16T10:38:30Z

What changes were proposed in this pull request?

In testing we have found an issues in the ECWritableContainerProvider.

For EC a pipeline is used for only one container, when the container gets closed, the pipeline also gets closed. At the moment, the only place in the code which closes the EC piplines which no longer have an open container is inside the ECWritableContainerProvider. It first gets the list of open piplines and enforces the pipeline limit, then for all open pipelines, it tries top find one the client can use.

If the client has had problems writing to the pipelines (eg it was given a container/pipeline and then the write failed as the container was closed on the DN), the pipelines get added to the exclude list. Then we can get into a situation where many pipelines need to be closed on the write path, slowing down block allocation.

Ideally, when a container transitions to CLOSING in SCM, if the container is an EC container, we should also close the associated pipeline to avoid it counting toward the limit and to avoid needing to close it during the write (block allocation) path.

This could be achieved relatively simply inside the PipelineManagerImpl.removeContainersFromPipeline() method which is called as soon as the container transitions to CLOSING via ContainerStateManagerImpl.updateContainerState() when it executes the containerStateChangeActions. Wrapping the container close and pipeline close in a lock inside PipelineManagerImpl ensure we have a consistent "ec container close" flow and it should avoid the ECWritableContainerProvider needing to close the pipelines internally. However we can leave that code in place in ECWritableContainerProvider incase some pipelines slip through somehow.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-9151

How was this patch tested?

New unit test added to ensure EC pipelines get closed and Ratis do not.

…tainerManager

guohao-rosicky · 2023-08-17T11:23:23Z

Thanks @sodonnel work on this, I think it's a good idea, this can close the pipeline in advance, so that the pipeline in the open state can be used to apply the ec block after getting it in advance.

siddhantsangwan

Reading this javadoc from PipelineStateManagerImpl#removeContainerFromPipeline:

        // Typically, SCM can send a pipeline close Action to datanode and
        // receive pipelineCloseAction to close the pipeline which will remove
        // the pipelineId both from the piplineStateMap as well as
        // pipeline2containerMap Subsequently, close container handler event can
        // also try to close the container as a part of which , it will also
        // try to remove the container from the pipeline2container Map which
        // will fail with PipelineNotFoundException. These are executed over
        // ratis, and if the exception is propagated to SCMStateMachine, it will
        // bring down the SCM. Ignoring it here.

The method that's annotated with @Replicate is updateContainerState and is executed over Ratis for all the SCMs.

The javadoc suggests that we also need to catch PipelineNotFoundException in PipelineManagerImpl#removeContainerFromPipeline when getting a pipeline:

Pipeline pipeline = stateManager.getPipeline(pipelineID);

since this also throws the same exception. Still trying to understand the Ratis part in-depth, but meanwhile what do you think?

...hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerManagerImpl.java

sodonnel · 2023-08-21T11:17:12Z

The javadoc suggests that we also need to catch PipelineNotFoundException in PipelineManagerImpl#removeContainerFromPipeline when getting a pipeline:
Pipeline pipeline = stateManager.getPipeline(pipelineID);

I think you are correct. It would be safer to catch the PNF exception from the getPipeline call too. I have updated it.

I didn't even notice the replicate annotation, as it was annotating the interface rather than the implementation, and I did not check the interface!

siddhantsangwan · 2023-08-23T05:39:19Z

...ds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/pipeline/TestPipelineManagerImpl.java

+      Assertions.assertEquals(1, pipelineManager
+          .getContainersInPipeline(ratisPipeline.getId()).size());
+      Assertions.assertEquals(1, pipelineManager
+          .getContainersInPipeline(ratisPipeline.getId()).size());


Should be ecPipeline.

Well spotted. I just pushed a commit to fix this.

siddhantsangwan

LGTM. Thanks for working on this!

…osing (apache#5191)" This reverts commit 77dc4db.

…osing (#5191)" (#5270) This reverts commit 77dc4db.

S O'Donnell added 3 commits August 16, 2023 11:17

HDDS-9151. Close EC Pipeline when container transitions to closing

d5f2b27

Change order of locking pipeline manager and container manager in con…

011fe93

…tainerManager

Fix findbugs

07763bf

siddhantsangwan reviewed Aug 17, 2023

View reviewed changes

...hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerManagerImpl.java Outdated Show resolved Hide resolved

Acquire locks outside of try block

6aa7fb7

Catch PNF exception when getting pipeline

56d5ac5

siddhantsangwan reviewed Aug 23, 2023

View reviewed changes

Fix ECPipeline in test

54333f0

siddhantsangwan approved these changes Aug 23, 2023

View reviewed changes

siddhantsangwan merged commit 77dc4db into apache:master Aug 23, 2023

sodonnel pushed a commit to sodonnel/hadoop-ozone that referenced this pull request Sep 11, 2023

Revert "HDDS-9151. Close EC Pipeline when container transitions to cl…

2acf8d4

…osing (apache#5191)" This reverts commit 77dc4db.

sodonnel mentioned this pull request Sep 11, 2023

Revert "HDDS-9151. Close EC Pipeline when container transitions to closing (#5191)" #5270

Merged

adoroszlai pushed a commit that referenced this pull request Sep 12, 2023

Revert "HDDS-9151. Close EC Pipeline when container transitions to cl…

5d5eb45

…osing (#5191)" (#5270) This reverts commit 77dc4db.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HDDS-9151. Close EC Pipeline when container transitions to closing #5191

HDDS-9151. Close EC Pipeline when container transitions to closing #5191

Uh oh!

sodonnel commented Aug 16, 2023

Uh oh!

guohao-rosicky commented Aug 17, 2023

Uh oh!

siddhantsangwan left a comment

Uh oh!

Uh oh!

sodonnel commented Aug 21, 2023

Uh oh!

siddhantsangwan Aug 23, 2023

Uh oh!

sodonnel Aug 23, 2023

Uh oh!

siddhantsangwan left a comment

Uh oh!

Uh oh!

HDDS-9151. Close EC Pipeline when container transitions to closing #5191

HDDS-9151. Close EC Pipeline when container transitions to closing #5191

Uh oh!

Conversation

sodonnel commented Aug 16, 2023

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

guohao-rosicky commented Aug 17, 2023

Uh oh!

siddhantsangwan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sodonnel commented Aug 21, 2023

Uh oh!

siddhantsangwan Aug 23, 2023

Choose a reason for hiding this comment

Uh oh!

sodonnel Aug 23, 2023

Choose a reason for hiding this comment

Uh oh!

siddhantsangwan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!