-
Notifications
You must be signed in to change notification settings - Fork 537
HDDS-8211. S3G QoS #4421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
HDDS-8211. S3G QoS #4421
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mohan3d Thanks for working on this. Can you add more details about the manual test? Where you able to get it manually to drop requests?
FairCallQueue
, if backoff
is enabled rejects requests when the queues are full. If a user has been submitting a lot of requests but the FCQ priority queues are not full, the system still processes these requests but less frequently.
If I understand your approach correctly, you are rejecting requests based on how many the user has already submitted. If a user has exceeded the maximum number of requests, you start dropping any new ones he submits. This raises a concern that it might be dropping requests that it shouldn't have.
@xBis7 Thanks for reviewing this. |
@mohan3d FairCallQueue has multiple priority queues (4 by default) and requests are placed into those queues based on their priorities. Your approach simulates the behavior of Let's say we have only 1 user and max requests is set to 10000. If the user exceeds that number of requests then your filter will start rejecting all new requests coming from that user while it shouldn't since he is the only one stressing the system. |
@xBis7 Good example, what should happen with the user when they are making too many requests in same earlier example. assuming that FairCallQueue is being used with backoff enabled? I think it will end up with the queues full (if the requests rate is too high) and it will start reject the requests from this single user? |
@mohan3d To be honest, I haven't tested such a scenario, but for the requests to fail with a single user, it means that the system fails. I would expect it to slow down processing and not fail entirely. Also, I can see that the number of handlers is updated during processing requests but is also the number of requests per user getting decremented? |
@xBis7 so the correct idea should be to slow down the processing in case the systems is too busy instead of rejecting requests? I am trying to understand more about the expected behavior. I might be able to update the solution.
Yeah both will be updated the way you mentioned. requests per a user will be reduced by a factor after each interval. |
@mohan3d Yes, you could also be rejecting requests but at the point where the system is at full capacity. When you have only 1 user, FCQ doesn't slow him down or drop his requests. |
@mohan3d Thanks a lot for the patch. Please let us know if you plan to continue working on this. The tests will need to be migrated to use JUnit5 -- we can do that for you if the PR is not abandoned. |
@adoroszlai Yeah, I would like to continue working on it. Aside from migrating the tests, what else do I need to update to get the PR merged. |
Thanks @mohan3d for continued interest in this.
Done, pushed to your fork.
I'm not familiar with FCQ, so I'll defer review to @duongkame, @kerneltime and @xBis7. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have left a minor comment.
ctx.abortWith(Response | ||
.status(Response.Status.SERVICE_UNAVAILABLE) | ||
.entity("Too many requests") | ||
.build()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a minor suggestion: We can wrap this in OS3Exception
according to the SlowDown Error code. A new OS3Exception
can be defined in the S3ErrorTable
(https://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html).

The AuthorizationFilter#wrapOS3Exception
can be reused by moving it to S3Utils
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @ivandika3 for your review and the suggested improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ivandika3 I pushed some updates, please let me know if it needs further changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update. Looks good.
What changes were proposed in this pull request?
Added a throttler package implementing request scheduler and some utilities. And a new filter in S3 to reject/accept user requests depending on the load of the s3gateway and their consumption.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-8211
How was this patch tested?
Unittest and Manual test.