Skip to content

HDDS-8436. Support setSafeMode(), isFileClosed() FileSystem API #4825

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jul 5, 2023

Conversation

jojochuang
Copy link
Contributor

What changes were proposed in this pull request?

Implement new Hadoop file system APIs to support HBase.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-8436 Support setSafeMode(), isFileClosed() FileSystem API

How was this patch tested?

Unit tests

@jojochuang jojochuang changed the title Hdds 8436 HDDS-8436. Support setSafeMode(), isFileClosed() FileSystem API Jun 3, 2023
@jojochuang jojochuang force-pushed the HDDS-8436 branch 3 times, most recently from be98bf6 to 613acce Compare June 27, 2023 16:46
Copy link
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jojochuang , thanks a lot for working on this! The change looks good. Please see the comments inlined.

Comment on lines 4443 to 4481
case LEAVE:
case FORCE_EXIT:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, we only support FORCE_EXIT. We should support LEAVE later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For LEAVE, it should exit safe mode only if preCheck is true. We should make the change separately.

//SCMSafeModeManager
@@ -237,20 +237,25 @@ public void completePreCheck(EventPublisher eventQueue) {
    * 2. Emits START_REPLICATION for ReplicationManager.
    * 3. Cleanup resources.
    * 4. Emit safe mode status.
-   * @param eventQueue
    */
   @VisibleForTesting
-  public void exitSafeMode(EventPublisher eventQueue) {
-    LOG.info("SCM exiting safe mode.");
-    // If safemode is exiting, then pre check must also have passed so
-    // set it to true.
-    setPreCheckComplete(true);
+  public boolean exitSafeMode(boolean force) {
+    LOG.info("SCM exiting safe mode, force? {}", force);
+    if (force) {
+      // If force exiting, set preCheck to true.
+      setPreCheckComplete(true);
+    } else {
+      if (!getPreCheckComplete()) {
+        return false;
+      }
+    }
     setInSafeMode(false);
 
     // TODO: Remove handler registration as there is no need to listen to
     // register events anymore.
 
     emitSafeModeStatus();
+    return true;
   }

if (!ofsPath.isKey()) {
throw new IOException("not a file");
}
OzoneFileStatus status = bucket.getFileStatus(pathStr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check status.isFile().

if (ofsPath.isSnapshotPath()) {
throw new IOException("file is in a snapshot.");
} else {
OzoneFileStatus status = bucket.getFileStatus(key);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check status.isFile().

@Override
public boolean setSafeMode(SafeModeAction action, boolean isChecked)
throws IOException {
statistics.incrementWriteOps(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if the action is not GET before incrementWriteOps.

@Override
public boolean setSafeMode(SafeModeAction action, boolean isChecked)
throws IOException {
statistics.incrementWriteOps(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if the action is not GET before incrementWriteOps.

@Override
public boolean setSafeMode(SafeModeAction action, boolean isChecked)
throws IOException {
statistics.incrementWriteOps(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if the action is not GET before incrementWriteOps.

Copy link
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jojochuang , thanks for the update!

We need to check GET before incrementWriteOp in two of RootedOzoneFileSystems. BTW, is it possible to move the setSafeMode method to Basic RootedOzoneFileSystem?

@Override
public boolean setSafeMode(SafeModeAction action, boolean isChecked)
throws IOException {
statistics.incrementWriteOps(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if the action is not GET before incrementWriteOps.

@jojochuang
Copy link
Contributor Author

BTW, is it possible to move the setSafeMode method to Basic RootedOzoneFileSystem?

We can't, because Hadoop 2 does not have SafeMode interface (introduced in Hadoop 3.3.6). So I can only put it where Ozone is not run with Hadoop 2 runtime.

@jojochuang
Copy link
Contributor Author

But as it turns out, we test MR jobs with Hadoop 3.1 3.2 and 3.3 runtime, and the MR job tests fail with Hadoop 3.1 and 3.2 (due to the lacking of SafeMode and LeaseRecoverable interface). So either we find a workaround or remove the Hadoop 3.1 3.2 tests.

@jojochuang jojochuang marked this pull request as ready for review June 30, 2023 15:40
Change-Id: Iba9c50fac4da3f90acce8168758f4284e47825cb

Compiles

Change-Id: I53b7c4dbe33735ca3b39e4502be983134b05c0fe

Compiles

Change-Id: I699803f225460713f8f47371a3c45b4e10cf9cdf

Completed safe mode implementation.

Change-Id: I57ba72625c98bf7faa5c6e54526a7d61d7731fee

Checkstyle.

Change-Id: I1ffd5f1b2756fad7b03864cc3725e9d41cd72b10

Add missing @OverRide annotations, increment statistics properly.

Change-Id: Idb45f85b8a8f72776f393a8a0f22eb23e72f815b
Change-Id: I1d0260c14c3dac2ed28f6ea86730183c0630174f
Change-Id: I4ebc4360dde010c9906b292e0e32008146759aa3
Change-Id: Ia0e9e0c429ae6718f18a890258388b61cbcafabd
Change-Id: I5fec0e5e7653a1070b38be012e17d476fd865a7d
Change-Id: I48f884edac8092af910397536754cfda809c399a
Copy link
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 the change looks good.

@szetszwo szetszwo merged commit b97d943 into apache:master Jul 5, 2023
errose28 added a commit to errose28/ozone that referenced this pull request Jul 10, 2023
* master: (36 commits)
  HDDS-8990. Intermittent timeout waiting on datanode4 9856 to become available (apache#5039)
  Revert "HDDS-7750. Incorrect WRITE ACL check. (apache#4992)"
  HDDS-7750. Incorrect WRITE ACL check. (apache#4992)
  HDDS-8985. Intermittent timeout exiting safe mode in HA secure tests (apache#5033)
  HDDS-8593. Add RootCARotationPoller to CertClient (apache#5030)
  HDDS-7645. Kubernetes check should fail fast if cluster cannot start (apache#5028)
  HDDS-8981. TestRootedOzoneFileSystem runs out of disk space (apache#5029)
  HDDS-8592. Fetch and save all root certificates during service's certificate rotation. (apache#5025)
  HDDS-8981. Disable TestRootedOzoneFileSystem#testSafeMode
  HDDS-8591. Create scheduler to check for new root ca certificates (apache#4961)
  HDDS-8979. error validating kustomization.yaml (apache#5024)
  HDDS-8973. Ozone SCM HA should not allocates duplicate IDs when transferring leadership (apache#5018)
  HDDS-8970. Snapshot Diff should return path relative to bucket root (apache#5015)
  HDDS-8975. Clarify SCM HA auto-bootstrap doc (apache#5021)
  HDDS-8689. Rotate Root CA and Sub CA in SCM. (apache#4943)
  HDDS-8436. Support setSafeMode(), isFileClosed() FileSystem API (apache#4825)
  HDDS-8880. Intermittent fork timeout in TestOMRatisSnapshots (apache#5022)
  HDDS-8962. Ensure docker env is stopped (apache#5011)
  HDDS-7794. [snapshot] SnapshotDiff should throw better error messages for exception handling (apache#5007)
  HDDS-7922. [FSO] S3G folder support fso layout filestatus s3A compatibility (apache#4448)
  ...
@jojochuang jojochuang added the hbase HBase on Ozone support label Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hbase HBase on Ozone support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants