-
Notifications
You must be signed in to change notification settings - Fork 392
[CELEBORN-1903] Enable connection heartbeat and close idle connection #3148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This PR is stale because it has been open 20 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
This issue was closed because it has been staled for 10 days with no activity. |
@Austinfjq Hi, can you fix the UTs first? This PR will be helpful. |
@FMX Hi, I pushed the fix. PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enables connection heartbeat and the closing of idle connections via new configuration options for different modules.
- Added configuration options for idle connection closure and heartbeat enabling in the docs.
- Introduced new methods in TransportConf and updated TransportContext to account for the new configurations.
- Added a logging statement in TransportChannelHandler to report when heartbeat is enabled.
Reviewed Changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated no comments.
File | Description |
---|---|
docs/configuration/network.md | Added new configuration options for closing idle connections and enabling heartbeat |
common/src/main/java/org/apache/celeborn/common/network/util/TransportConf.java | Added new methods channelHeartbeatEnabled() and closeIdleConnections() to access the new configs |
common/src/main/java/org/apache/celeborn/common/network/server/TransportChannelHandler.java | Inserted logging to indicate heartbeat enabling |
common/src/main/java/org/apache/celeborn/common/network/TransportContext.java | Updated initialization logic for idle connections and heartbeat settings |
Files not reviewed (1)
- common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala: Language not supported
Comments suppressed due to low confidence (1)
docs/configuration/network.md:22
- Typo in the description: 'by by default' should be corrected to 'by default'.
| celeborn.<module>.closeIdleConnections | false | false | Should idle connections be closed by by default for the module. "<module>" could be client, worker, etc. If you would like to provide a fallback value for all modules, please omit the ".<module>" and use "celeborn.closeIdleConnections". | 0.6.0 | |
@Austinfjq, IMO, we could deprecate |
@@ -86,12 +86,19 @@ public TransportContext( | |||
AbstractSource source) { | |||
this.conf = conf; | |||
this.msgHandler = msgHandler; | |||
this.closeIdleConnections = closeIdleConnections; | |||
this.closeIdleConnections = closeIdleConnections || conf.closeIdleConnections(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to change the invoke of this contructor instead of changing here.
@Austinfjq, any update? Please resolve the conflict of the latest main branch. |
Hi @Austinfjq We plan to release 0.6.0 soon, could you address the comments? |
5590ef0
to
0dffcf6
Compare
…eartbear # Conflicts: # common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala # docs/configuration/network.md
…tch.heartbeat.enabled
…oseIdleConnections
@SteNicholas Thank you for the fix in #3239. Sorry for the late reply. Updated the pull request. |
could you check the CI failure? |
### What changes were proposed in this pull request? Support min number of workers to assign slots on for a shuffle. ### Why are the changes needed? PR apache#3039 updated the default value of `celeborn.master.slot.assign.extraSlots` to avoid skew in shuffle stage with less number of reducers. However, it will also affect the stage with large number of reducers, thus not ideal. We are introducing a new config `celeborn.master.slot.assign.minWorkers` which will ensure that shuffle stages with less number of reducers will not cause load imbalance on few nodes. ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? NA Closes apache#3297 from s0nskar/min_workers. Authored-by: Sanskar Modi <[email protected]> Signed-off-by: SteNicholas <[email protected]>
### What changes were proposed in this pull request? Support dependencies of `spark-4.0` profile. Follow up apache#3282. ### Why are the changes needed? apache#3282 is lack of dependencies support of `spark-4.0` profile. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Dependencies check: maven-jdk17 (spark-4.0). Closes apache#3298 from SteNicholas/CELEBORN-1413. Authored-by: SteNicholas <[email protected]> Signed-off-by: SteNicholas <[email protected]>
What changes were proposed in this pull request?
Add new configs to enable connection heartbeat and close idle connection.
Why are the changes needed?
Enable client side config which will allow all connections to enable heartbeat and detect connection failure.
Does this PR introduce any user-facing change?
Yes. Added 2 new configs.
How was this patch tested?
Tested in a Flink batch job.