-
Notifications
You must be signed in to change notification settings - Fork 535
HDDS-12356. granular locking framework for obs #8217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Why do we need yet another PR? |
@adoroszlai We are working on prototypes to discuss different approaches. I agree it is cluttering the main PR queue and I suggest others to raise the PR against their own fork and link it in a Jira comment like this instead cc @szetszwo |
That's fine. But this is the third PR from @sumitagrawl with basically the same text description. Having 3 different approaches is OK even from same developer, but please describe properly. Based on commits, this seems to be a continuation of #8176. There is absolutely no need to open a new PR for addressing review comments, it just splits the conversation unnecessarily. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sumitagrawl. I'm still not sure what the story on the lock stats is. Are we planning to expose these as metrics? They would have to be aggregated so I'm not sure how helpful that would be. Are we planning to log warnings when wait time or hold time crosses a given value? That would probably be useful. How we use the lock stats will depend on whether we need a long running counter to aggregate the information, or a new set of stats for each lock acquisition. Maybe we will need both.
keyLocks = SimpleStriped.readWriteLock(NUM_KEY_STRIPES, false); | ||
} | ||
|
||
public OmLockObject lock(OmLockInfo lockInfo) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why return OmLockObject
and require caller to pass it back to unlock
instead of using Autocloseable
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defined OmLockObject as autoclosable, so based on usecase, it can be used with try-with-resouce.
We need return this object as it holds LockStats which is required to be set to HadoopRPC metrics with each request calling this interface.
This might be added to ozone metrics also to capture lock stats.
|
||
public OmLockObject lock(OmLockInfo lockInfo) throws IOException { | ||
OmLockObject omLockObject = new OmLockObject(lockInfo); | ||
List<Lock> locks = omLockObject.getLocks(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing initializes the OmLockObject
lock list now and we already have the locks inside the stripes in this class. Why would we try to get them out of OmLockObject
, or even add them there in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As response for above comments, its required as,
- it keep track of lock stats to be responsed to Hadoop Call metrics for each request
public static class OmLockObject { | ||
private final OmLockInfo omLockInfo; | ||
private final List<Lock> locks = new ArrayList<>(); | ||
private final OmLockStats lockStats = new OmLockStats(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these fields intended to be static?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, as this is referred for each request, and can not be static.
return writeLockNanos; | ||
} | ||
|
||
void add(long timeNanos, Type type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove the Type
enum all together and just create one method for updating each stat instead of one method to unpack the enum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@sumitagrawl , I am also confused for these similar PRs. We should have a better description on why there are three PRs and what are difference between them, and close the previous PRs if there are no longer needed.
Agree. It is even better to keep using the same PR. |
This work to re-architect the locking should probably be performed on a branch and merged to master when it is completed and tested, in order to keep master shipable. |
This is done incrementally with no impact to existing flow. Original PR is splitted for merge as suggested to @szetszwo for better review and merge back to master in continuation. |
@errose28 @szetszwo @swamirishi PR is updated is all review fix. |
The introduction of locking is not re architecting but adding the capability to serialize in an alternate scheme. Going with feature branch might be too heavy handed for this. |
@kerneltime Recall the addition of a much smaller change to introduce atomic rewrite, where yourself and @errose28 insisted on doing the development on a branch. That change was much smaller and less impactful than this one. While I agree this change is "framework code" and will not be impactful to existing logic, the followup tasks might be impactful and done in several PRs if they result in large changes:
In general I don't agree with pushing development to a branch too quickly, because It is a pain to get the branch merged as it requires a vote. However, we do need consistency on the project and to decide on what requires a branch and what does not. Especially if the goal is to always have "master shippable". Perhaps the test is, if a PR can leave master in a state where it is not shippable without another PR (eg half the code using the old locking and half using new locking), then the feature should be on a branch, even if the branch is only 3 or 4 commits before merging back to master. Another question is whether such a branch should need a vote to merge? I'd argue that a short lived branch with a handful of commits is a useful thing and should not require a vote to merge - it encourages small PRs which are easier to review and build up the feature in a more natural way. It would make using branches for this sort of thing light weight and tool to be used rather than a burden. That all said, I am not involved in this area of work and I am not following how involved or risky this change is overall. |
@sodonnel I am actually in favor of doing this initial set of OBS migration tasks on a feature branch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sumitagrawl , thanks for working on this! Copied my comments on errose28#5 to here.
BTW, could you please address my comments posted on May 27 in #7583 ? If we are not clear about the design, it does not make sense to start writing code.
*/ | ||
public static final class LockInfo implements Comparable<LockInfo> { | ||
private final String name; | ||
private final boolean isWriteLock; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use an enum is better than a boolean; see "Never Use Booleans for Something That Has Two States Now, but Might Have More Later".
public OmLockObject lock(OmLockInfo lockInfo) throws IOException { | ||
OmLockObject omLockObject = new OmLockObject(lockInfo); | ||
long startTime = Time.monotonicNowNanos(); | ||
Optional<OmLockInfo.LockInfo> optionalVolumeLock = lockInfo.getVolumeLock(); | ||
Optional<OmLockInfo.LockInfo> optionalBucketLock = lockInfo.getBucketLock(); | ||
Optional<Set<OmLockInfo.LockInfo>> optionalKeyLocks = lockInfo.getKeyLocks(); | ||
List<Lock> locks = new ArrayList<>(); | ||
|
||
if (optionalVolumeLock.isPresent()) { | ||
OmLockInfo.LockInfo volumeLockInfo = optionalVolumeLock.get(); | ||
if (volumeLockInfo.isWriteLock()) { | ||
omLockObject.setReadStatsType(false); | ||
locks.add(volumeLocks.get(volumeLockInfo.getName()).writeLock()); | ||
} else { | ||
locks.add(volumeLocks.get(volumeLockInfo.getName()).readLock()); | ||
} | ||
} | ||
|
||
if (optionalBucketLock.isPresent()) { | ||
OmLockInfo.LockInfo bucketLockInfo = optionalBucketLock.get(); | ||
if (bucketLockInfo.isWriteLock()) { | ||
omLockObject.setReadStatsType(false); | ||
locks.add(bucketLocks.get(bucketLockInfo.getName()).writeLock()); | ||
} else { | ||
locks.add(bucketLocks.get(bucketLockInfo.getName()).readLock()); | ||
} | ||
} | ||
|
||
if (optionalKeyLocks.isPresent()) { | ||
for (ReadWriteLock keyLock: keyLocks.bulkGet(optionalKeyLocks.get())) { | ||
omLockObject.setReadStatsType(false); | ||
locks.add(keyLock.writeLock()); | ||
} | ||
} | ||
|
||
try { | ||
acquireLocks(locks, omLockObject.getLocks()); | ||
lockStatsBegin(omLockObject.getLockStats(), Time.monotonicNowNanos(), startTime); | ||
} catch (InterruptedException e) { | ||
Thread.currentThread().interrupt(); | ||
throw new OMException("Waiting for locks is interrupted, " + lockInfo, OMException.ResultCodes.INTERNAL_ERROR); | ||
} catch (TimeoutException e) { | ||
throw new OMException("Timeout occurred for locks " + lockInfo, OMException.ResultCodes.TIMEOUT); | ||
} | ||
return omLockObject; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lock overhead probably is quite high in this method. It may not be able to improve the performance much.
return omLockObject; | ||
} | ||
|
||
private void acquireLocks(List<Lock> locks, Stack<Lock> acquiredLocks) throws TimeoutException, InterruptedException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concurrently, if the lock.tryLock(..)
is interrupted, the partial lock won't be released. Getting everything right with timeout is not easy.
If we are adding timeout support, we have to update the design doc first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added section for having timeout in design doc for lock
Comparing this PR with #8192 :
|
Benchmark program for this PR: https://issues.apache.org/jira/secure/attachment/13075848/8217_benchmark.patch |
UseEpsilonGC (no gc): https://blogs.oracle.com/javamagazine/post/epsilon-the-jdks-do-nothing-garbage-collector
|
Some analysis of the benchmark would help, i.e. where is the memory savings coming from? Based on an initial look I think |
@sumitagrawl is away for a few days so the updates will be incoming after that. |
@errose28 , thanks for looking at the benchmark results!
The memory consumption is from the multiple unnecessary data structures used in this PR.
According to @kerneltime and @sumitagrawl , most cases only have 1 key. That's why I used a smaller bucketKeyListCount compared to bucketKeyCount. We may try different combinations. Any suggestions? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sumitagrawl thanks for the patch. Left some comments inline
public OmRequestGatekeeper() { | ||
volumeLocks = SimpleStriped.readWriteLock(NUM_VOLUME_STRIPES, false); | ||
bucketLocks = SimpleStriped.readWriteLock(NUM_BUCKET_STRIPES, false); | ||
keyLocks = SimpleStriped.readWriteLock(NUM_KEY_STRIPES, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are number of stripes not configurable?
for (Lock lock: locks) { | ||
if (lock.tryLock(LOCK_TIMEOUT_DEFAULT, TimeUnit.MILLISECONDS)) { | ||
try { | ||
acquiredLocks.add(lock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use acquiredLocks.push() instead
acquiredLocks.add(lock); | ||
} catch (Throwable e) { | ||
// We acquired this lock but were unable to add it to our acquired locks list. | ||
lock.unlock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This catch block seems unnecessary as no exception would be thrown
What changes were proposed in this pull request?
Granular locking framework for OBS for existing flow. Its just framework binding code flow but pending to call the lock in respective flow.
locking is added for external request at entry point. This provides execution of request at leader and existing flow simultaneouly without impacting for cache.
locking added for obs request
(HDDS-11898. design doc leader side execution)Step-by-step integration
of existing request (interoperability)Next PR will include:
Parent Jira:
https://issues.apache.org/jira/browse/HDDS-11900
Its Parent for Epic;
https://issues.apache.org/jira/browse/HDDS-11897
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-12356
How was this patch tested?