-
Notifications
You must be signed in to change notification settings - Fork 537
HDDS-11463. Track and display failed DataNode storage locations in SCM. #7266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
@errose28 Could you please help review this PR? Thank you very much! We discussed the relevant implementation together in HDDS-11463. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @slfan1989, this looks like a useful addition. I only had time for a quick high level look for now.
* Handler of ozone admin scm volumesfailure command. | ||
*/ | ||
@Command( | ||
name = "volumesfailure", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the CLI, we should probably use something like ozone admin datanode volume list
. The datanode
subcommand is already used to retrieve information about datanodes from SCM. Splitting the commands so that volume
has its own subcommand gives us more options in the future.
To distinguish failed and healthy volumes and filter out different nodes, we can either add some kind of filter flag, or leave it up to grep/jq to be applied to the output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also means we should make the RPC more generic to support pulling all volume information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for helping to review this PR! I will continue to improve the relevant code based on your suggestions.
@@ -382,6 +383,7 @@ public abstract static class Builder<T extends Builder<T>> { | |||
private boolean failedVolume = false; | |||
private String datanodeUuid; | |||
private String clusterID; | |||
private long failureDate; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets use failureTime
. I'm assuming this is being stored as millis since epoch, so it will have data and time information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have improved the relevant code.
// Ensure it is set only once, | ||
// which is the time when the failure was first detected. | ||
if (failureDate == 0L) { | ||
setFailureDate(Time.now()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use Instant.now()
per HDDS-7911.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@errose28 Can you help review this PR again? Thank you very much!
Thanks @slfan1989 for working on this. Converted it to draft because there is a failing test:
https://github.com/slfan1989/ozone/actions/runs/11471452180 |
a83a8f7
to
b1df492
Compare
@adoroszlai Thank you for reviewing this PR! I am currently making improvements, and once the changes pass the CI tests in my branch, I will reopen the PR. cc: @errose28 |
8bc7ae0
to
ff43fac
Compare
@adoroszlai Thank you for reviewing this PR! I will also pay closer attention to CI issues in future development. I understand that CI testing resources are valuable. I have made improvements to the code based on @errose28 suggestions and also fixed the related unit test errors. The CI for my branch has passed(https://github.com/slfan1989/ozone/actions/runs/11719380711), and I have updated the PR status to 'Ready for Review'. |
@errose28 Could you please help review this PR again? Thank you very much! I’ve made some additional improvements to this PR, as we wanted to print all the disk information. However, since there’s quite a lot of disk data, I’ve added pagination functionality. |
Temporarily converted to draft and assigned to myself, to resolve conflicts. |
@adoroszlai Thank you for your attention to this PR. I will continue to follow up on it. |
Merged from
Previously the license header was a javadoc in this new file, so the problem was hidden. |
c18377e
to
1197c0a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @slfan1989 for the patch.
private String uuid; | ||
|
||
// The HostName identifier of the DataNode. | ||
@Option(names = { "--hostName" }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please avoid camelCase for options. (Also for --displayMode
.)
@Option(names = { "--hostName" }, | |
@Option(names = { "--hostname" }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still applies to the latest patch.
// PageSize refers to the number of items displayed per page | ||
// in a paginated view. | ||
@Option(names = { "--pageSize" }, | ||
defaultValue = "20", | ||
description = "The number of volume information items displayed per page.") | ||
private int pageSize; | ||
|
||
// The current page. | ||
@Option(names = { "--currentPage" }, | ||
defaultValue = "1", | ||
description = "The current page.") | ||
private int currentPage; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other Ozone CLI commands allow pagination (via ListOptions
reusable mixin) with --start
to specify the start item, --length
for "page size", and --all
(only one of these last two are allowed). Please try to use options consistent with that. Using ListOption
directly may not work, since --prefix
does not seem to be applicable here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other Ozone CLI commands allow pagination (via ListOptions reusable mixin) with --start to specify the start item, --length for "page size", and --all (only one of these last two are allowed). Please try to use options consistent with that.
I have thoroughly reviewed ListOptions
, and it is indeed a powerful tool class. However, my requirements differ slightly from its original functionality. I want to implement a pagination display, as there may sometimes be a certain number of damaged disks. For example, if there are 30 damaged disks and the limit
is set to 10, the currentPage
can range from 1 to 3, allowing for pagination. In this case, currentPage
represents the current page number. Therefore, I plan to add the currentPage
option to ListOptions
to meet this need.
Using ListOption directly may not work, since --prefix does not seem to be applicable here.
The --start
and --prefix
options have limited relevance for our functionality. Currently, we support filtering by hostName
, which I believe should be sufficient. I'm unsure whether we need to filter by disk letter as well. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with numeric "current page" is that the list is not fixed. By the time you request the next page, it may be shorter or longer. Then you may get the same item twice (if new item is added in the range of the earlier pages) or not at all (if item is removed from earlier pages).
Using an anchor item eliminates those problems by allowing the size of "previous pages" to change.
Compare GitHub's list of commits, which uses commit SHA as anchor:
to list of pull requests, which has simple numeric page:
https://github.com/apache/ozone/pulls?page=2
Created HDDS-12995 to move out --prefix
, to allow reusing ListOptions
for lists that are filtered by other parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your detailed explanation! I now understand the reliability of using anchor points and plan to make the corresponding improvements in the code. Regarding HDDS-12995, I understand that you mean to move the --prefix
option from its current position, specifically placing it within the command, rather than having it as a global option. Am I understanding this correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Display it in JSON format. | ||
@Option(names = { "--json" }, | ||
defaultValue = "false", | ||
description = "Format output as JSON.") | ||
private boolean json; | ||
|
||
// Display it in TABLE format. | ||
@Option(names = { "--table" }, | ||
defaultValue = "false", | ||
description = "Format output as Table.") | ||
private boolean table; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make these exclusive, like:
ozone/hadoop-ozone/cli-shell/src/main/java/org/apache/hadoop/ozone/shell/ListOptions.java
Lines 27 to 28 in af1f98c
@CommandLine.ArgGroup(exclusive = true) | |
private ExclusiveLimit exclusiveLimit = new ExclusiveLimit(); |
ozone/hadoop-ozone/cli-shell/src/main/java/org/apache/hadoop/ozone/shell/ListOptions.java
Lines 63 to 74 in af1f98c
static class ExclusiveLimit { | |
@CommandLine.Option(names = {"--length", "-l"}, | |
description = "Maximum number of items to list", | |
defaultValue = "100", | |
showDefaultValue = CommandLine.Help.Visibility.ALWAYS) | |
private int limit; | |
@CommandLine.Option(names = {"--all", "-a"}, | |
description = "List all results", | |
defaultValue = "false") | |
private boolean all; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good suggestion! I will improve the code based on this recommendation.
description = "failed is used to display failed disks, " + | ||
"normal is used to display normal disks.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
description seems to be dup of --displayMode
. (Same for --hostName
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will improve the help information for displayMode
, hostname
, and uuid
.
@Option(names = { "--displayMode" }, | ||
defaultValue = "all", | ||
description = "failed is used to display failed disks, " + | ||
"normal is used to display normal disks.") | ||
private String displayMode; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use an enum
to limit values allowed.
https://picocli.info/#_enum_types
System.setOut(new PrintStream(outContent, false, DEFAULT_ENCODING)); | ||
System.setErr(new PrintStream(errContent, false, DEFAULT_ENCODING)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to manually set out/err when using SystemOutCapturer
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good question! After using SystemOutCapturer
, we no longer need to manually set out/err. I have already improved this part of the code.
private HddsProtos.DatanodeDetailsProto createDatanodeDetails() { | ||
Random random = ThreadLocalRandom.current(); | ||
String ipAddress = random.nextInt(256) | ||
+ "." + random.nextInt(256) | ||
+ "." + random.nextInt(256) | ||
+ "." + random.nextInt(256); | ||
|
||
DatanodeDetails.Builder dn = DatanodeDetails.newBuilder() | ||
.setUuid(UUID.randomUUID()) | ||
.setHostName("localhost" + "-" + ipAddress) | ||
.setIpAddress(ipAddress) | ||
.setPersistedOpState(HddsProtos.NodeOperationalState.IN_SERVICE) | ||
.setPersistedOpStateExpiry(0); | ||
|
||
for (DatanodeDetails.Port.Name name : ALL_PORTS) { | ||
dn.addPort(DatanodeDetails.newPort(name, 0)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use MockDatanodeDetails
.
required string uuid = 1; | ||
required string hostName = 2; | ||
required string volumeName = 3; | ||
required bool failed = 4; | ||
required int64 failureTime = 5; | ||
required int64 capacity = 6; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -49,7 +51,7 @@ public class DatanodeInfo extends DatanodeDetails { | |||
private volatile long lastHeartbeatTime; | |||
private long lastStatsUpdatedTime; | |||
private int failedVolumeCount; | |||
|
|||
private List<VolumeInfoProto> volumeInfos; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
volumeInfos
is never assigned. I think the intention was to update this variable in updateStorageReports
in the block holding writeLock
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pointing out this issue! I hadn't noticed these details before.
name = "volumes", | ||
description = "Display the list of volumes on the DataNode.", | ||
mixinStandardHelpOptions = true, | ||
versionProvider = HddsVersionProvider.class) | ||
public class VolumeSubCommand extends ScmSubcommand { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently this subcommand is dangling, not added to any parent. If you add it somewhere, it will be available as ozone ... volumes
. Suggestion from @errose28 was to add this as ozone admin datanode volume list
, which allows adding other subcommands related to datanode volumes later.
Subcommands are added like this:
Lines 33 to 40 in af1f98c
subcommands = { | |
ListInfoSubcommand.class, | |
DecommissionSubCommand.class, | |
MaintenanceSubCommand.class, | |
RecommissionSubCommand.class, | |
StatusSubCommand.class, | |
UsageInfoSubcommand.class | |
}) |
@adoroszlai Thank you very much for reviewing this PR! I will continue to improve it. |
@adoroszlai Could you help review this PR? Thank you very much! I have conducted tests on my own branch, and it currently passes the key CI tests. https://github.com/slfan1989/ozone/actions/runs/15209585380 |
|
||
// If startItem is specified, find its position in the volumeInfos list | ||
int startIndex = 0; | ||
if (StringUtils.isNotBlank(startItem)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adoroszlai I added logic to skip startItem
in this part of the code, but after thinking it through, I realized it’s better to use the server’s hostname
or UUID
as startItem
instead of a disk prefix
. That’s because many machines name their disks like data0
to data9
, and using a disk name could lead to unexpected filtering behavior.
@adoroszlai Can we move forward with this PR? I would appreciate your advice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @slfan1989 for updating the patch.
private static final Codec<VolumeInfo> CODEC = new DelegatedCodec<>( | ||
Proto2Codec.get(HddsProtos.VolumeInfoProto.getDefaultInstance()), | ||
VolumeInfo::fromProtobuf, | ||
VolumeInfo::getProtobuf, | ||
VolumeInfo.class); | ||
|
||
public static Codec<VolumeInfo> getCodec() { | ||
return CODEC; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codec is required only for storing in DB, but VolumeInfo
does not seem to be persisted by either datanode or SCM. So I think this can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed the CODEC
.
@@ -304,6 +304,15 @@ message RemoveScmResponseProto { | |||
optional string scmId = 2; | |||
} | |||
|
|||
message VolumeInfoProto { | |||
optional string uuid = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use DatanodeIDProto
.
*/ | ||
public final class VolumeInfo implements Comparable<VolumeInfo> { | ||
|
||
private String uuid; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use DatanodeID
.
|
||
@Override | ||
public int compareTo(VolumeInfo that) { | ||
Preconditions.checkNotNull(that); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: prefer builtin Objects.requireNonNull
* @throws IOException | ||
* I/O exceptions that may occur during the process of querying the volume. | ||
*/ | ||
StorageContainerLocationProtocolProtos.GetVolumeInfosResponseProto getVolumeInfos( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please import GetVolumeInfosResponseProto
instead of StorageContainerLocationProtocolProtos
.
private String uuid; | ||
|
||
// The HostName identifier of the DataNode. | ||
@Option(names = { "--hostName" }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still applies to the latest patch.
@CommandLine.Mixin | ||
private ListPaginationOptions listOptions; | ||
|
||
enum DISPLAYMODE { all, normal, failed } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Enums should be named like other types (classes, interfaces): DisplayMode
.
Also, please consider using all-caps for values (ALL
, etc.)
@Option(names = { "--displayMode" }, | ||
defaultValue = "all", | ||
description = "Display mode for disks: 'failed' shows failed disks, " + | ||
"'normal' shows healthy disks, 'all' shows all disks.") | ||
private DISPLAYMODE displayMode; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On another look, I think "display mode" is confusing. JSON and Table are display modes. "all/normal/failed" filter the list by volume state.
I suggest renaming to --state
, and renaming normal
to healthy
.
Then description can be simplified to Filter disks by state
.
|
||
// If displayed in JSON format. | ||
if (json) { | ||
System.out.print(JsonUtils.toJsonStringWithDefaultPrettyPrinter(volumeInfos)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still applicable.
Also, please use println
, to avoid situation like HDDS-13100.
@AfterEach | ||
public void tearDown() { | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unnecessary
Hi @slfan1989 thanks for working on this change. I think there are three attributes being added here which should be reviewed separately:
The RPC to retrieve volume information is definitely required going forward regardless of the other two items to create some sort of CLI to query volume state. Tracking the failure time of the volume seems like a somewhat invasive change since it spans the datanode, heartbeat, and SCM. Is this necessary, or is it enough to depend on a metrics database to track timing of cluster events? Of course we need improvements to our volume metrics as well as mentioned in #8405. On the CLI front, I do think we need a dedicated
Do we need a dedicated |
@errose28 Thank you for your message! I'd like to share some thoughts from a different perspective. As it stands, this feature does not conflict with the proposal in #8405. #8405 represents a more innovative and forward-looking design, and although it's still under discussion, it will certainly be valuable if implemented as planned. At the same time, I believe this feature does not impact HDDS-13096 or HDDS-13097. My comment on #8405 was more about expressing expectations for the system’s future capabilities — I hope Ozone can gradually support such features — rather than raising any objections to #8405 itself. The design of #7266 is inspired by HDFS's disk failure detection mechanism, with the goal of improving the system's ability to identify and locate failed disks. For users migrating from HDFS to Ozone, using the volume command to directly view failed disks can offer a more intuitive and convenient operational experience. From my perspective, we all play different roles in this project. Your team focuses on evolving and optimizing the system's architecture, while we, as external users, are more focused on refining specific functional details based on real-world use. Ultimately, however, we share the same goal: to make Ozone more robust, more user-friendly, and more widely adopted. Naturally, it's not easy to fully align these detail-oriented changes with larger, ongoing feature developments — for example, making #7266 fully consistent with #8405. This is mainly because #8405 is broader in scope, with a longer timeline, whereas #7266 focuses on a very specific aspect. While we fully respect the overall direction, we also hope to move forward with some smaller, incremental improvements to address current practical issues. In addition to this PR, we're also working on several other enhancements. For instance, we've implemented mechanisms to collect DataNode I/O statistics to more precisely manage container replication. We've also introduced time-based peak/off-peak control logic for various DataNode management operations (such as deletion, container replication, and EC container reconstruction). These improvements are driven by real-world production needs, and from our perspective, they've shown positive results. However, since many of these PRs have some degree of code coupling with our previous contributions, it's difficult for us to combine everything into a single, unified patch for upstream submission. Therefore, we hope to proceed with #7266 for now. If #8405 later results in a more complete or improved solution, we’d be happy to continue refining things in that direction. In the meantime, this also gives us a valuable opportunity to participate in the community and contribute to Ozone’s development. |
@adoroszlai Thank you very much for reviewing the code. I will make improvements based on your suggestions. @errose28's comments are essentially not in conflict with #7266, and I'm looking forward to seeing #7266 progress so that we can move forward with the subsequent work. |
@adoroszlai Thank you very much for your detailed suggestions! I've made the changes accordingly. Could you review this PR again? Thank you very much! I respect @errose28's perspective. However, I believe this PR does not conflict with #8405, nor with HDDS-13096 or HDDS-13097 — they can coexist. We've already spent considerable time reviewing this PR together, and I'd like to continue moving it forward. |
@adoroszlai @errose28 Can I still continue to follow up on this PR? I feel that I’ve put in some effort, but right now I’ve lost a clear direction on how to proceed. According to @errose28 suggestion, this PR only needs to keep the RPC part, but I’m not sure how to continue working on the related functionality from here. |
@slfan1989 Thanks for all your efforts on this PR. The concerns/suggestions raised by @errose28 make sense though. Please try to reach agreement. I won't be able to re-review until next week in any case. |
@adoroszlai Thank you very much for your message and for your continued support and assistance! Since @errose28 is currently planning some new features, I believe this PR could be considered as part of that effort, especially given the amount of work we've already invested. As for which specific features should be retained, it would be helpful if @errose28 could review and provide guidance. |
What changes were proposed in this pull request?
Currently, we lack tools on the SCM side to track failed disks on DataNodes. DataNodes have already reported this information, and we need to display it.
In this PR, we will display the failed disks on the DataNode. The information can be displayed in JSON format or using the default format.
What is the link to the Apache JIRA
JIRA: HDDS-11463. Track and display failed DataNode storage locations in SCM.
How was this patch tested?
Add Junit Test & Testing in a test environment.