-
Notifications
You must be signed in to change notification settings - Fork 537
HDDS-12078. Improve container reconciliation CLIs #7944
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: HDDS-10239-container-reconciliation
Are you sure you want to change the base?
HDDS-12078. Improve container reconciliation CLIs #7944
Conversation
…prove-cli * HDDS-10239-container-reconciliation: (113 commits) HDDS-12361. Mark testGetSnapshotDiffReportJob as flaky HDDS-12375. Random object created and used only once (apache#7933) HDDS-12335. Fix ozone admin namespace summary to give complete output (apache#7908) HDDS-12364. Require Override annotation for overridden methods (apache#7923) HDDS-11530. Support listMultipartUploads max uploads and markers (apache#7817) HDDS-11867. Remove code paths for non-Ratis SCM. (apache#7911) HDDS-12363. Add import options to IntelliJ IDEA style settings (apache#7921) HDDS-12215. Mark testContainerStateMachineRestartWithDNChangePipeline as flaky HDDS-12331. BlockOutputStream.failedServers is not thread-safe (apache#7885) HDDS-12188. Move server-only upgrade classes from hdds-common to hdds-server-framework (apache#7903) HDDS-12362. Remove temporary checkstyle suppression file (apache#7920) HDDS-12343. Fix spotbugs warnings in Recon (apache#7902) HDDS-12286. Fix license headers and imports for ozone-tools (apache#7919) HDDS-12284. Fix license headers and imports for ozone-s3-secret-store (apache#7917) HDDS-12275. Fix license headers and imports for ozone-integration-test (apache#7904) HDDS-12164. Rename and deprecate DFSConfigKeysLegacy config keys (apache#7803) HDDS-12283. Fix license headers and imports for ozone-recon-codegen (apache#7916) HDDS-12330. Convert Volume, Bucket and Key count to use comma separated numbers (apache#7881) HDDS-12281. Fix license headers and imports for ozone-filesystem-hadoop3 (apache#7913) HDDS-12285. Fix license headers and imports for ozone-s3gateway (apache#7918) ... Conflicts: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerReplicaInfo.java hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/server/JsonUtils.java hadoop-hdds/tools/src/main/java/org/apache/hadoop/hdds/scm/cli/container/ReconcileSubcommand.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer robot tests over integration tests. Overall seems fine, will go over a few more times.
@CommandLine.Parameters(description = "One or more container IDs separated by spaces. " + | ||
"To read from stdin, specify '-' and supply the container IDs " + | ||
"separated by newlines.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for this PR, but we can look into brace expansion so that the CLI can specify a range {1...1000}. This can ease the scanning for status via CLI across many containers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not something Ozone CLI needs to implement. Shells like bash
and zsh
do that already with any command:
$ echo {1..5}
1 2 3 4 5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah having a "recipes" section or similar in our CLI docs would help to call out tricks like this one and some of the jq examples given in the PR description. Probably too verbose to put in our one-ish line CLI help messages and we don't have man pages, so I think website docs would be the place to add such details.
// processing the remaining containers. | ||
int invalidCount = 0; | ||
while (idIterator.hasNext()) { | ||
long containerID = Long.parseLong(idIterator.next()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be useful to catch NumberFormatException
and print a more friendly error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Forgot this wasn't being handled as a long automatically by picocli. Will add a test for this.
System.out.println("Use \"ozone admin container reconcile --status " + containerID + "\" to see the checksums of " + | ||
"each container replica"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have a good way to indicate when reconciliation is completed... Maybe we add a reconciliation count as a metric to the container info?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, some sort of feedback that it finished would be good. I was planning to revisit this with the SCM integration. For example, SCM may end up tracking the known in-flight reconciliations and could report the results.
Are you suggesting adding a jmx metric or just a counter in the code? I'm not sure exactly how this work, but I imagine the counter would get persisted on the DN side and SCM would see the updates. Users would then look for the count to increase when they send in a command.
// Read from stdin. | ||
idIterator = new Scanner(System.in, StandardCharsets.UTF_8.name()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can get rid of code duplicated from container InfoSubcommand
after #7970.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Do you want us to review and merge that one first?
Yes I will add robot tests as well. The current tests are unit tests without a mini ozone cluster. The scm client is mocked to return various combinations of replicas that would otherwise be hard to create in a mini ozone or docker cluster. |
What changes were proposed in this pull request?
Summary
Currently
ozone admin container reconcile
is used to trigger reconciliation for a specific container, but the resulting checksums are only shown inozone admin container info --json
which can be verbose and usually requiresjq
filtering to get the desired information.This PR adds new functionality to the
ozone admin container reconcile command
:ozone admin container reconcile 1 2 3 4
ozone admin container info
:cat <id-file> | ozone admin container reconcile -
--status
flag that prints replica and checksum information instead of sending reconcile commands:ozone admin container reconcile --status 1 2 3 4
replicasMatch
field is set totrue
orfalse
on the client side, which can be used to quickly filter out containers with mismatched replicas.No proto changes are made. The same
ScmClient#getContainer
andScmClient#getContainerReplicas
methods used byozone admin container info
are re-used forozone admin container reconcile --status
. This command just provides a more filtered view of the results specific to reconciliation and adds thereplicasMatch
field.Examples
Querying status of one container:
Querying the status of multiple containers (json list output):
Given a list of container IDs in a file, output the IDs of all the ones with mismatched replicas:
Same as above, but now send the IDs of mismatched containers back in for reconciliation (TODO clean up duplicate output):
What is the link to the Apache JIRA
HDDS-12078
How was this patch tested?