-
Notifications
You must be signed in to change notification settings - Fork 535
HDDS-12928. datanode min free space configuration #8388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
As discussion over community and above discussion, Approach 2 is better. Another point came for default min.free.space -- instead of |
...c/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeConfiguration.java
Show resolved
Hide resolved
...c/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeConfiguration.java
Show resolved
Hide resolved
...c/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeConfiguration.java
Show resolved
Hide resolved
...st/java/org/apache/hadoop/ozone/container/common/statemachine/TestDatanodeConfiguration.java
Show resolved
Hide resolved
@@ -29,6 +29,7 @@ | |||
import static org.apache.hadoop.ozone.container.common.statemachine.DatanodeConfiguration.FAILED_DB_VOLUMES_TOLERATED_KEY; | |||
import static org.apache.hadoop.ozone.container.common.statemachine.DatanodeConfiguration.FAILED_METADATA_VOLUMES_TOLERATED_KEY; | |||
import static org.apache.hadoop.ozone.container.common.statemachine.DatanodeConfiguration.FAILED_VOLUMES_TOLERATED_DEFAULT; | |||
import static org.apache.hadoop.ozone.container.common.statemachine.DatanodeConfiguration.HDDS_DATANODE_VOLUME_MIN_FREE_SPACE_PERCENT_DEFAULT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a new unit test which doesn't explicitly set any of the two properties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is considered in org.apache.hadoop.ozone.container.common.statemachine.TestDatanodeConfiguration#isCreatedWitDefaultValues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isCreatedWitDefaultValues unsets DatanodeConfiguration.HDDS_DATANODE_VOLUME_MIN_FREE_SPACE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unset ensure default value is used in ozone configuration, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unset is done for ozone-site.xml as defined in test module, so that it can use default value if not defined. comment added.
...c/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeConfiguration.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeConfiguration.java
Outdated
Show resolved
Hide resolved
} | ||
|
||
@Test | ||
void useMinFreeSpaceIfBothMinFreeSpacePropertiesSet() { | ||
void useMaxAsPercentIfBothMinFreeSpacePropertiesSet() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
useMaxAsPercentIfBothMinFreeSpacePropertiesSet ->
useMaxIfBothMinFreeSpacePropertiesSet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@adoroszlai This is concluded to have in same PR if design is very small and need discussion. Having multiple PR for such changes may not be required. This is confirmed in community meeting. Please share if have different opinion over this ? |
I already mentioned why I think separate PR is better even for small design doc. Alternatively: once code changes are added to the PR, push any further design doc changes together with code changes. No need to split this PR now, but please consider it next time. |
@sumitagrawl Could you resolve the code conflict? Thanks. |
I think for small changes, it should be ok to put the design doc and the code change together. If the PR is churning on the design doc we can keep the PR in draft mode and switch it to ready once design doc looks good and the code works in the fork. This would make it much easier to to browse the git log and understand the code changes done in one go. From a usability stand point I would much rather have a small design doc live together with a code change when browsing history. |
done |
That's a good point. It should be fine, if developers can follow one of:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sumitagrawl for the patch.
This case is more useful for test environment where disk space is less and no need any additional configuration. | ||
|
||
# Conclusion | ||
1. Going with Approach 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think supporting both explicit size and percent is good, but there's a few issues still not addressed:
- Do we only want to support setting one global size for all volumes or supporting individual volume configs?
- If we are adjusting how
hdds.datanode.volume.min.free.space
works, we should also adjusthdds.datanode.dir.du.reserved
to support configuration in a consistent way. - It is bad UX to have two different configs (percent and value) for the same thing. The user has no intuition as to what happens when both are configured.
- Having a
max
function buried in the code to resolve this instead of making them exclusive is even worse.
- Having a
Probably the most user friendly thing to do is deprecate the percent config keys and have one config that takes either a size or percent based value. Whether we want to continue supporting individual volume mappings in the config is still an open question that needs to be resolved in this proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@errose28
The config is applicable for each volume as global config
only, do not support individual volume. As min-free-space as ozone maintain is global in nature for each volume.
hdds.datanode.dir.du.reserved
config simplification is not in scope of this JIRA/PR.
Using 2 config has been discussed in community meeting, and concluded to have both. Any concern now, need re-discuss over community again.
Single config: Approach "2" is not
being opted with majority, and hence went with Approach 1 as max of 2. I have updated in design doc for both Approach 1 and approach 2 pros/cons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using 2 config has been discussed in community meeting, and concluded to have both. Any concern now, need re-discuss over community again.
Community meetings are for synchronous discussion, not definitive decisions. There are many other forums (mailing list, PRs, Jira, Github discussion). I think this kind of issue is fine for discussion in PR. If you are concerned about visibility, please discuss on mailing list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@errose28 after discussion over community, will go with Approach 1
only.
-
purpose of du.reserved config is identifying the disk to be reserved for the application sharing the disk, and hence its at disk level. But here, since its ozone managed space, this needs to be flat configuration. So both need not be same.
-
For simplicity for min.free.space config, its at global level, and may not be required to be disk level similar to reserved.
-
Max of min.free space and percent is done, min.free space represent min threshold for most of the disk ranges, and percent to be if some disk are exceptionally higher size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me try to add some guiding principles for config modifications which can help us compare one decision or another. The following are usability issues that can occur with config keys:
- Inconsistent config format: Configs that operate on similar entities (space usage, address + port, percentages) that read those values differently.
- Hidden config dependencies: When one configuration whose value is unchanged functions differently based on the value applied to a different config.
- This does not include invalid config combinations that fail component startup, since that is easily caught and called out with an error message. We know that no actively running system will have this configuration.
Both hdds.datanode.du.reserved{.percent}
and hdds.datanode.min.free.space{.percent}
have issues here, and this is our chance to fix them. Now let's look at how our options either help or hurt the above points.
Inconsistent config format
hdds.datanode.du.reserved
and hdds.datanode.min.free.space
are both used to configure space reservation on datanode drives, so as stated in point 1 it is most intuitive if they accept the same value format. It is ok if one format is more useful for one than another. For example per-volume configuration may be required for hdds.datanode.du.reserved
but not for hdds.datanode.min.free.space
. It's still ok for both to have that option because it is not invalid for hdds.datanode.min.free.space
, there is still only one set of formatting options for users to remember, and only one parser in the code. If we pick and choose different valid formats for each config we will have two formats to remember and two parsers in the code. Therefore even removing allowed config formats from hdds.datanode.min.free.space
that are still present in hdds.datanode.du.reserved
actually adds complexity. Based on this hdds.datanode.du.reserved
and hdds.datanode.min.free.space
must accept values of the same format to avoid introducing new config usability problems.
Hidden config dependencies
Next let's look at how the percent variations affect point 2. Anything other than failing startup if the percent and non-percent variations are specified creates this problem, so if a percent and non-percent config key are given like hdds.datanode.min.free.space.percent
and hdds.datanode.min.free.space
it must be considered invalid and fail the datanode.
There is another option though: get rid of the percentage specific config keys but still support percentage based configuration with the one hdds.datanode.min.free.space
config. Let's look at why this works:
hdds.datanode.du.reserved
needs to support volume specific configuration in the form of<volume-path>:reserved-size
since not all volumes may be used as spill for compute, or the volumes may be utilized differently.- This means we will always have a parsing method like VolumeUsage#getReserved to handle converting config strings into long values for a volume.
hdds.datanode.min.free.space
andhdds.datanode.du.reserved
should support the same value format, sohdds.datanode.min.free.space
also needs to use this same parser.- If we are already need a string parser for both configs, we might as well make it differentiate between percentage and size based configs too.
Proposal to address all requirements
The following layout meets all the constraints defined above:
- Only two config keys:
hdds.datanode.min.free.space
andhdds.datanode.du.reserved
- The valid formats for either config key are:
- A fixed size, like
20GB
- A percentage as a float, like
0.001
. The lack of a unit differentiates it from the first option. - A mapping of volumes to sizes, like
/data/hdds1:20GB,/data/hdds2:10GB
- A fixed size, like
- Only one parser is required for both types of configs.
- This is not new since a parser is already required and cannot be removed without removing support for per-volume configuration in
hdds.datanode.du.reserved
.
- This is not new since a parser is already required and cannot be removed without removing support for per-volume configuration in
We should never introduce usability issues in our configurations. We have enough of them already : ) If you can show how an alternate proposal meets all the configuration requirements without impacting usability we can consider that as well, but currently none of the proposals in the doc satisfy this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@errose28 You mean we need have another config for min.free.space?
min.free.space.volumes:/data/hdds1:20GB,/data/hdds2:10GB. -- to be similar to reserve? as similar to du.reserve
I do not feel being in name of similar config for space, we should go with this approach, These are if different purpose. Making similar just in name of both represent free space
will make configuration complex for min.free.space as user need config for all disk. There is no usecase till not for min.free.space for this.
I do not agree with this approach. In future if there is a need for this for volume mapping for min.free.space, we can ad as separate requirement and handle.
Share your suggestion for this PR if can be merged .....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding one more to @errose28's list of requirements: cross-compatibility. When extending the possible values allowed for existing configuration, e.g.:
- adding suffix
- starting to support percentage
- allowing list of items instead of single one
we need to consider that even old version may encounter values understood only by new one, and fail. (See HDDS-13077 for a specific example.)
In such cases it may be better to deprecate the existing config properties and add new one(s).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sumitagrawl please re-read the Proposal to address all requirements section in my reply. I think this very clearly states the proposal but the things you are referring to in your reply are not mentioned there.
You mean we need have another config for min.free.space?
No, two configs, one for min free space and one for DU reserved that each use the same value schema. I very clearly said in the previous response "Only two config keys: hdds.datanode.min.free.space and hdds.datanode.du.reserved".
I do not feel being in name of similar config for space, we should go with this approach, These are if different purpose.
This is your take as developer. You need to look at this from a user's perspective. Our consistent failure to consider this perspective is why the system is difficult to use. Configs representing the same "type" of configuration, be it an address, percentage, disk space, time duration, etc must accept the same types of values. Users are not going to understand the nuance of why two similar configs accept different value formats, and in a few months I probably won't either.
Making similar just in name of both represent free space will make configuration complex for min.free.space as user need config for all disk.
This is not part of the proposal. Please re-read it. Min space can be configured with one value across all disks, OR it can use a volume mapping.
There is no usecase till not for min.free.space for this.
Lack of use case is not a valid reason to create a separate value schema for configs that work on the same type. There is also no use case for setting hdds.heartbeat.interval
to 7d
, but the same value makes perfect sense for hdds.container.scrub.data.scan.interval
. Yet they use the same value schema because they both represent time intervals. Your suggestion is analogous to rejecting the d
suffix for hdds.heartbeat.interval
because it would never be set that long.
we need to consider that even old version may encounter values understood only by new one, and fail.
We definitely need to formalize our configuration compatibility guarantees. This probably warrants a dedicated discussion somewhere more visible. My initial take is that we should always support "new software old config", but that supporting "old software new config" is not sustainable because it closes our config for extensions. Especially on the server side this would seem like a deployment error. Maybe our client side config compat guarantees would be different from the server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, two configs, one for min free space and one for DU reserved that each use the same value schema
DU reserved is special case carried from Hadoop, for case of disk sharing by other application. This may not be required to have same value Schema. This needs user input over various disk as sharing may differ, so this schema is specialized. They are not of same type.
This is your take as developer. You need to look at this from a user's perspective. Our consistent failure to consider this perspective is why the system is difficult to use.
From user perspective only, user have no knowledge how to configure the min-free-space, this is more internal
to Ozone working.
volume mapping
This might be additional config can be added later on on need basis. May be we should not add just based on
intuition, as this may go to be dead config.
Please share any possible use case in practical env, we can take up this as enhancement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per disk configuration is an abomination that stems from needing to run other applications on nodes/drives along with HDFS in the past. It makes sense for the du
config where essentially we tell the Datanode to spare a few drives. This is very different from the min
configurations, which has to do with operations and uptime of applications. We must keep configurations for min
same across all drives as it has to do with space for repairs and recovery and nothing to do with configuration of the cluster with regards to co-existing with peer applications.
I am all for consistency but in this case it implies a capability that I am not sure we wish to implement.
...est-s3/src/test/java/org/apache/hadoop/ozone/s3/awssdk/v1/TestS3SDKV1WithRatisStreaming.java
Outdated
Show resolved
Hide resolved
@errose28 @kerneltime please share opinion, based on this will merge this PR |
What changes were proposed in this pull request?
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-12928
How was this patch tested?