-
Notifications
You must be signed in to change notification settings - Fork 535
HDDS-10374. Make container scanner generate merkle trees during the scan #7490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: HDDS-10239-container-reconciliation
Are you sure you want to change the base?
HDDS-10374. Make container scanner generate merkle trees during the scan #7490
Conversation
…anner-builds-mt * HDDS-10239-container-reconciliation: HDDS-10928. Implement container comparison logic within datanodes. (apache#7293)
Missing chunks are incorrectly reported as corrupted
Metadata scanner integration tests are passing
@@ -1348,7 +1358,6 @@ public void markContainerUnhealthy(Container container, ScanResult reason) | |||
} finally { | |||
container.writeUnlock(); | |||
} | |||
createContainerMerkleTree(container); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a container moves from OPEN
to UNHEALTHY
state we should try to build a merkle tree with whatever data we have at the moment before the scanner builds the actual merkle tree. If for some reason (either metadata/data error) we are unable to build it, then we can log the exception and move.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, when we have an unhealthy container we should build the tree from the actual data, but this method only builds the tree from the metadata. I'm renaming it in the next commit to make that clearer. We should not depend on just the metadata if we suspect corruption. The way this should work is open -> unhealthy would trigger an on-demand scan of the container, but that code path is not in place right now. We may want to add that to our branch actually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think we should add the on-demand scan here to generate the Merkle tree.
...service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall the change looks good to me. I have left a minor comment to my previous comment.
write(data, checksumInfoBuilder.build()); | ||
LOG.debug("Data merkle tree for container {} updated", containerID); | ||
// If write succeeds, update the checksum in memory. Otherwise 0 will be used to indicate the metadata failure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be a way to tell if a checksum failed vs. scanner has not yet run? Should failure to generate checksum == -1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, the in-memory hash should be what is written to disk as it is an in-memory cache. If updating the Merkle tree is failing, we can have a metric + log message and/or a status report to SCM or a new Admin API to Datanode to query the Merkle tree (I think there are going to be multiple debug scenarios where we would like to query a Datanode what it knows independent of SCM reports).
...e/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/apache/hadoop/ozone/container/ozoneimpl/BackgroundContainerDataScanner.java
Outdated
Show resolved
Hide resolved
…anner-builds-mt * HDDS-10239-container-reconciliation: (646 commits) HDDS-11763. Implement container repair logic within datanodes. (apache#7474) HDDS-11887. Recon - Identify container replicas difference based on content checksums (apache#7942) HDDS-12397. Persist putBlock for closed container. (apache#7943) HDDS-12361. Mark testGetSnapshotDiffReportJob as flaky HDDS-12375. Random object created and used only once (apache#7933) HDDS-12335. Fix ozone admin namespace summary to give complete output (apache#7908) HDDS-12364. Require Override annotation for overridden methods (apache#7923) HDDS-11530. Support listMultipartUploads max uploads and markers (apache#7817) HDDS-11867. Remove code paths for non-Ratis SCM. (apache#7911) HDDS-12363. Add import options to IntelliJ IDEA style settings (apache#7921) HDDS-12215. Mark testContainerStateMachineRestartWithDNChangePipeline as flaky HDDS-12331. BlockOutputStream.failedServers is not thread-safe (apache#7885) HDDS-12188. Move server-only upgrade classes from hdds-common to hdds-server-framework (apache#7903) HDDS-12362. Remove temporary checkstyle suppression file (apache#7920) HDDS-12343. Fix spotbugs warnings in Recon (apache#7902) HDDS-12286. Fix license headers and imports for ozone-tools (apache#7919) HDDS-12284. Fix license headers and imports for ozone-s3-secret-store (apache#7917) HDDS-12275. Fix license headers and imports for ozone-integration-test (apache#7904) HDDS-12164. Rename and deprecate DFSConfigKeysLegacy config keys (apache#7803) HDDS-12283. Fix license headers and imports for ozone-recon-codegen (apache#7916) ... Conflicts: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerMerkleTreeWriter.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/checksum/ContainerMerkleTreeTestUtils.java hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/checksum/TestContainerMerkleTreeWriter.java hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueContainerCheck.java hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/ozoneimpl/TestBackgroundContainerDataScanner.java hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/ozoneimpl/TestOnDemandContainerDataScanner.java hadoop-ozone/dist/src/main/smoketest/admincli/container.robot hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/dn/scanner/TestBackgroundContainerDataScannerIntegration.java hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/dn/scanner/TestBackgroundContainerMetadataScannerIntegration.java hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/dn/scanner/TestOnDemandContainerDataScannerIntegration.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @errose28 for the patch.
@@ -465,26 +476,46 @@ private static List<ContainerScanError> verifyChecksum(BlockData block, | |||
StringUtils.bytes2Hex(expected.asReadOnlyByteBuffer()), | |||
StringUtils.bytes2Hex(actual.asReadOnlyByteBuffer()), | |||
block.getBlockID()); | |||
chunkHealthy = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can break the loop here because smallest unit is chunk.
@@ -32,9 +32,10 @@ public enum FailureType { | |||
MISSING_METADATA_DIR, | |||
MISSING_CONTAINER_FILE, | |||
MISSING_CHUNKS_DIR, | |||
MISSING_CHUNK_FILE, | |||
MISSING_DATA_FILE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qq: Is this for the cases when *.block
file is missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I changed the name so it is agnostic of container layout.
…anner-builds-mt * HDDS-10239-container-reconciliation: HDDS-12849. On demand container scanner should not be static (apache#8296) HDDS-12745. Container checksum should be reported during container close and DN restart (apache#8204) Conflicts: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/OnDemandContainerDataScanner.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/OzoneContainer.java hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/impl/TestContainerSet.java hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/ozoneimpl/TestOnDemandContainerDataScanner.java
…builds-mt * reconcile-unit-test-framework: Checkstyle Update test name Improve char data generation, reset scan metrics Wait for on-demand scans to complete in test Use tree writer between peer updates. All tests pass Add checksum validation, generate readable data Logging improvements Use mixin to standardize scanner operations, log checksum changes in scanner Fix tree tracking during reconcile process Improve logging in test and prod code Most tests passing WIP migrate reconciliation unit tests Conflicts: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/OnDemandContainerDataScanner.java
Clean up checksum update invocation using container controller.
…anner-builds-mt * HDDS-10239-container-reconciliation: HDDS-12980. Add unit test framework for reconciliation. (apache#8402) hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerMerkleTreeWriter.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/ContainerController.java hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/checksum/ContainerMerkleTreeTestUtils.java hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/checksum/TestContainerMerkleTreeWriter.java hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestContainerReconciliationWithMockDatanodes.java hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/dn/checksum/TestContainerCommandReconciliation.java
…anner-builds-mt * HDDS-10239-container-reconciliation: (433 commits) Revert "HDDS-11232. Spare InfoBucket RPC call for the FileSystem#getFileStatus calls for the general case. (apache#6988)" (apache#8358) HDDS-12993. Use DatanodeID in ReconNodeManager. (apache#8456) HDDS-13024. HTTP connect to 0.0.0.0 failed (apache#8439) HDDS-12914. Bump awssdk to 2.31.40, test ResumableFileDownload (apache#8455) HDDS-4677. Document Ozone Ports and Connection End Points (apache#8226) HDDS-13044. Remove DatanodeDetails#getUuid usages from hdds-common/client/container-service (apache#8462) HDDS-12568. Implement MiniOzoneCluster.Service for Recon (apache#8452) HDDS-13046. Add .vscode to .gitignore (apache#8461) HDDS-13022. Split up exclusive size tracking for key and directory cleanup in SnapshotInfo (apache#8433) HDDS-12966. DBDefinitionFactory should not throw InvalidArnException (apache#8454) HDDS-12948. [Snapshot] Increase `ozone.om.fs.snapshot.max.limit` default value to 10k (apache#8376) HDDS-11904. Fix HealthyPipelineSafeModeRule logic. (apache#8386) HDDS-12989. Throw CodecException for the Codec byte[] methods (apache#8444) HDDS-11298. Support owner field in the S3 listBuckets request (apache#8315) HDDS-12810. Check and reserve space atomically in VolumeChoosingPolicy (apache#8360) HDDS-13016. Add a getAllNodeCount() method to NodeManager. (apache#8445) HDDS-12299. Merge OzoneAclConfig into OmConfig (apache#8383) HDDS-12501. Remove OMKeyInfo from SST files in backup directory. (apache#8214) HDDS-13021. Make callId unique for each request in AbstractKeyDeletingService (apache#8432) HDDS-13000. [Snapshot] listSnapshotDiffJobs throws NPE. (apache#8418) ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that struck my mind is that it's not related to this PR, but we can have it as a separate JIRA. We shouldn't allow reconciliation of a container that is already reconciling. This can lead to some other bugs where we update the metadata or the data in parallel.
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Show resolved
Hide resolved
...va/org/apache/hadoop/ozone/dn/scanner/TestBackgroundContainerMetadataScannerIntegration.java
Show resolved
Hide resolved
...er-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueHandler.java
Outdated
Show resolved
Hide resolved
Most of this should be handled for us by the ReplicationSupervisor and tested generally here. However this depends on how we define the uniqueness of the reconciliation commands and here I think we do have a problem as you mentioned, because we can get multiple simultaneous reconciliation tasks for the same container if the peer list is different. We should fix this in a follow-up so that Update: Filed HDDS-13086 for this. |
Thanks for the review @aswinshakil. All comments should be addressed. I still need to stabilize this test on our branch before trying to get green CI with this change. |
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Show resolved
Hide resolved
...r-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/ContainerScanHelper.java
Show resolved
Hide resolved
Still save ICR for the end of reconciliation
The changes for the scan gap look good. I see there are some checkstyle-related CI failures. |
What changes were proposed in this pull request?
Allow the container scanner to update the container Merkle Tree with the currently calculated hashes of what is on disk. If the tree does not already exist, it will be created. Even if the container is unhealthy, the merkle tree should still be updated to reflect the current state of the container. Hashes stored in RocksDB that were given by the client during write will not be modified.
Specific changes:
KeyValueContainerCheck
has been modified to support building the merkle tree from the data the scanner sees.KeyValueHandler
.updateContainerChecksum
method for scanners to call so that ICRs are sent for checksum updates even if the container's state does not change.KeyValueHandler
to manage the in-memory data checksum update in theContainerData
object, so we don't need to do it from withinContainerChecksumTreeManager
anymore.ContainerScanHelper
class was created to consolidate duplicate code among the scanners.ContainerMerkleTree
can now be built to contain blocks without any chunks, which is necessary to represent states like an empty block file.What is the link to the Apache JIRA
HDDS-10374
How was this patch tested?
This patch contains unit, integration, and acceptance test additions and modifications. Major ones to note:
KeyValueHandler#updateContainerChecksum
method has a unit test added to check both ICR and disk updates happen correctly.TestContainerReconciliationWithMockDatanodes
suite has been enabled after HDDS-12980. Add unit test framework for reconciliation. #8402More robust testing of fault combinations will be added later using the framework being developed under HDDS-11942