Skip to content

HDDS-12778. Change ordering of transfer in OM bootstrap process. #8245

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

sadanand48
Copy link
Contributor

@sadanand48 sadanand48 commented Apr 8, 2025

What changes were proposed in this pull request?

The following changes are done:

  1. change the ordering of transfer (active DB at the last and snapshot data first)
  2. Additional debug logs for the bootstrap process
  3. Wait for existence of snapshot dirs only for the ones present in table cache
  4. Do not start streaming the tarball until getFilesForArchive() completes as : If this throws an exception, an empty tarball is sent.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-12778

Copy link
Contributor

@hemantk-12 hemantk-12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @sadanand48 for the improvement. It will reduce unnecessary memtable flush and checkpoint creation for tarball.

What type of testing are we doing for it? Also, can you fix the test cases?

if (!processDir(dir, copyFiles, hardLinkFiles, sstFilesToExclude,
new HashSet<>(), excluded, copySize, null)) {
return false;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove this extra line.

Comment on lines +314 to +318
if (LOG.isDebugEnabled()) {
if (processedSnapshotData) {
LOG.debug("Finished processing OM snapshot data");
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: there is no need to do LOG.isDebugEnabled() check for a simple string. It is useful when String concatenation is needed.

Suggested change
if (LOG.isDebugEnabled()) {
if (processedSnapshotData) {
LOG.debug("Finished processing OM snapshot data");
}
}
if (processedSnapshotData) {
LOG.debug("Finished processing OM snapshot data");
}

Alternatively

if (LOG.isDebugEnabled() && processedSnapshotData) {
  LOG.debug("Finished processing OM snapshot data");
}

boolean processedTmpSstCompactionDir = processDir(sstBackupDir.getTmpDir().toPath(),
copyFiles, hardLinkFiles, sstFilesToExclude, new HashSet<>(),
excluded, copySize, sstBackupDir.getOriginalDir().toPath());
if (LOG.isDebugEnabled()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as previous.

Comment on lines +324 to +326
boolean processedTmpSstCompactionDir = processDir(sstBackupDir.getTmpDir().toPath(),
copyFiles, hardLinkFiles, sstFilesToExclude, new HashSet<>(),
excluded, copySize, sstBackupDir.getOriginalDir().toPath());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
boolean processedTmpSstCompactionDir = processDir(sstBackupDir.getTmpDir().toPath(),
copyFiles, hardLinkFiles, sstFilesToExclude, new HashSet<>(),
excluded, copySize, sstBackupDir.getOriginalDir().toPath());
boolean processedSstCompactionBackupDir = processDir(sstBackupDir.getTmpDir().toPath(),
copyFiles, hardLinkFiles, sstFilesToExclude, new HashSet<>(),
excluded, copySize, sstBackupDir.getOriginalDir().toPath());

Comment on lines +337 to +339
boolean processedTmpCompactionLogDir = processDir(compactionLogDir.getTmpDir().toPath(),
copyFiles, hardLinkFiles, sstFilesToExclude, new HashSet<>(),
excluded, copySize, compactionLogDir.getOriginalDir().toPath());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's just call it processedCompactionLogDir. We create tmp for local referencing only.

Suggested change
boolean processedTmpCompactionLogDir = processDir(compactionLogDir.getTmpDir().toPath(),
copyFiles, hardLinkFiles, sstFilesToExclude, new HashSet<>(),
excluded, copySize, compactionLogDir.getOriginalDir().toPath());
boolean processedCompactionLogDir = processDir(compactionLogDir.getTmpDir().toPath(),
copyFiles, hardLinkFiles, sstFilesToExclude, new HashSet<>(),
excluded, copySize, compactionLogDir.getOriginalDir().toPath());

@@ -380,18 +408,24 @@ private Set<Path> getSnapshotDirs(DBCheckpoint checkpoint, boolean waitForDir)
iterator = checkpointMetadataManager
.getSnapshotInfoTable().iterator()) {

// For each entry, wait for corresponding directory.
// For each entry, wait for corresponding directory if the key is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we assuming that whatever is flushed to disk must have a snapshot directory?

@jojochuang jojochuang added the snapshot https://issues.apache.org/jira/browse/HDDS-6517 label Apr 30, 2025
@@ -114,10 +114,17 @@ public static void createHardLinks(Path dbPath) throws IOException {

// Create a link for each line.
for (String l : lines) {
String from = l.split("\t")[1];
String to = l.split("\t")[0];
String[] parts = l.split("\t");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there test code added for the new change in createHardLinks?

omCheckpointUrl + ". ErrorCode: " + errorCode);
omCheckpointUrl + ". ErrorCode: " + errorCode;
LOG.error(message);
throw new IOException(message);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. connection object should call disconnect() before throwing the exception.
  2. This is unrelated, but the caller of this method, RDBSnapProvider.downloadDBSnapshotFromLeader() does not retry on exception. Should an exception be retried?

* @param lines Text lines defining the link paths.
* @param testDirName Name of test directory.
*/
private void checkFabricatedLines(Set<String> directories, List<String> lines,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed because the usage is removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
snapshot https://issues.apache.org/jira/browse/HDDS-6517
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants