Skip to content

HDDS-11513. All deletion configurations should be configurable without restart #8003

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sarvekshayr
Copy link
Contributor

What changes were proposed in this pull request?

The following deletion related configurations are now dynamically reconfigurable without requiring a restart.
This allows for easy tuning of deletion in response to logs and metrics.

OM:

  • ozone.directory.deleting.service.interval
  • ozone.thread.number.dir.deletion

DATANODE:

  • ozone.block.deleting.service.interval
  • ozone.block.deleting.service.timeout

What is the link to the Apache JIRA

HDDS-11513

How was this patch tested?

OM:

  • List all OM reconfigurable properties
bash-5.1$ ozone admin reconfig --service=OM --address=om:9862 properties
OM: Node [om:9862] Reconfigurable properties:
ozone.administrators
ozone.directory.deleting.service.interval
ozone.key.deleting.limit.per.task
ozone.om.server.list.max.size
ozone.om.volume.listall.allowed
ozone.readonly.administrators
ozone.thread.number.dir.deletion
  • Reconfigured ozone.directory.deleting.service.interval successfully
bash-5.1$ ozone admin reconfig --service=OM --address=om:9862 start     
OM: Started reconfiguration task on node [om:9862].
bash-5.1$ ozone admin reconfig --service=OM --address=om:9862 status
OM: Reconfiguring status for node [om:9862]: started at Tue Mar 04 05:51:56 UTC 2025 and finished at Tue Mar 04 05:51:56 UTC 2025.
SUCCESS: Changed property ozone.directory.deleting.service.interval
        From: "1m"
        To: "2m"
  • Reconfigured ozone.thread.number.dir.deletion successfully
bash-5.1$ ozone admin reconfig --service=OM --address=om:9862 start     
OM: Started reconfiguration task on node [om:9862].
bash-5.1$ ozone admin reconfig --service=OM --address=om:9862 status
OM: Reconfiguring status for node [om:9862]: started at Tue Mar 04 06:41:19 UTC 2025 and finished at Tue Mar 04 06:41:19 UTC 2025.
SUCCESS: Changed property ozone.thread.number.dir.deletion
        From: ""
        To: "20"
  • Error case when ozone.thread.number.dir.deletion value is negative.
bash-5.1$ ozone admin reconfig --service=OM --address=om:9862 start     
OM: Started reconfiguration task on node [om:9862].
bash-5.1$ ozone admin reconfig --service=OM --address=om:9862 status
OM: Reconfiguring status for node [om:9862]: started at Tue Mar 04 07:07:08 UTC 2025 and finished at Tue Mar 04 07:07:08 UTC 2025.
FAILED: Change property ozone.thread.number.dir.deletion
        From: ""
        To: "-20"
        Error: ozone.thread.number.dir.deletion cannot be negative..

DATANODE:

  • List all DN reconfigurable properties
bash-5.1$ ozone admin reconfig --service=DATANODE --address=datanode:19864 properties
DN: Node [datanode:19864] Reconfigurable properties:
hdds.datanode.block.delete.threads.max
hdds.datanode.block.deleting.limit.per.interval
hdds.datanode.replication.streams.limit
ozone.block.deleting.service.interval
ozone.block.deleting.service.timeout
ozone.block.deleting.service.workers
  • Reconfigured ozone.block.deleting.service.interval and ozone.block.deleting.service.timeout successfully
bash-5.1$ ozone admin reconfig --service=DATANODE --address=datanode:19864 start     
DN: Started reconfiguration task on node [datanode:19864].
bash-5.1$ ozone admin reconfig --service=DATANODE --address=datanode:19864 status
DN: Reconfiguring status for node [datanode:19864]: started at Tue Mar 04 05:50:29 UTC 2025 and finished at Tue Mar 04 05:50:29 UTC 2025.
SUCCESS: Changed property ozone.block.deleting.service.interval
        From: "1m"
        To: "2m"
SUCCESS: Changed property ozone.block.deleting.service.timeout
        From: "300000ms"
        To: "350000ms"

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sarvekshayr for the patch.

Making properties reconfigurable is a good first step, but the services do not pick up the updated configurations. dirDeletingServiceInterval etc. are only used by the tests.

Comment on lines +68 to +69
private String blockDeletingServiceInterval;
private String blockDeletingServiceTimeout;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should not store the raw String config.

@aryangupta1998
Copy link
Contributor

@adoroszlai, the dirDeletingServiceInterval(ozone.directory.deleting.service.interval) and blockDeletingServiceInterval(ozone.block.deleting.service.interval) are used by the background services to determine the intervals between their execution. These intervals, apart from being used in tests, are configured for the background services that delete directories and blocks. The background service uses scheduleWithFixedDelay to schedule the tasks with the specified intervals.
public void start() { exec.scheduleWithFixedDelay(service, 0, interval, unit); }

@adoroszlai
Copy link
Contributor

BlockDeletingService is created with interval and timeout taken from config:

Duration blockDeletingSvcInterval = dnConf.getBlockDeletionInterval();
long blockDeletingServiceTimeout = config
.getTimeDuration(OZONE_BLOCK_DELETING_SERVICE_TIMEOUT,
OZONE_BLOCK_DELETING_SERVICE_TIMEOUT_DEFAULT,
TimeUnit.MILLISECONDS);
int blockDeletingServiceWorkerSize = config
.getInt(OZONE_BLOCK_DELETING_SERVICE_WORKERS,
OZONE_BLOCK_DELETING_SERVICE_WORKERS_DEFAULT);
blockDeletingService =
new BlockDeletingService(this, blockDeletingSvcInterval.toMillis(),
blockDeletingServiceTimeout, TimeUnit.MILLISECONDS,
blockDeletingServiceWorkerSize, config,
datanodeDetails.threadNamePrefix(),
context.getParent().getReconfigurationHandler());

They are stored in the base BackgroundService class:

public BackgroundService(String serviceName, long interval,
TimeUnit unit, int threadPoolSize, long serviceTimeout,
String threadNamePrefix) {
this.interval = interval;
this.unit = unit;
this.serviceName = serviceName;
this.serviceTimeoutInNanos = TimeDuration.valueOf(serviceTimeout, unit)
.toLong(TimeUnit.NANOSECONDS);

which schedules the task when started:

public void start() {
exec.scheduleWithFixedDelay(service, 0, interval, unit);

Reconfiguring does not change values in BackgroundService (they are final), and even updating interval wouldn't change the task already scheduled with fixed delay (it needs to be rescheduled).

@aryangupta1998
Copy link
Contributor

Thanks for the clarification, @adoroszlai. I checked the code, for these configs to be updated, we need to restart the OM. The "ozone om" command starts the key manager, which then initializes the block deleting and directory deleting service with their intervals. So, simply updating these properties without starting the OM again won't be effective here.

@adoroszlai
Copy link
Contributor

simply updating these properties without starting the OM again won't be effective here

That's my point. OM can be tweaked to pick up the config without restart, but the current patch is not enough for that yet.

@adoroszlai adoroszlai marked this pull request as draft March 4, 2025 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants