-
Notifications
You must be signed in to change notification settings - Fork 3.9k
[c++] Segmentation fault when use parallel_tree_learner method when number of categories of one feature larger than 28 (default max_cat_to_onehot-4). #6491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Im also having this issue while trying to use lightgbm ^4.0.0 with a ray cluster, the worker crashes with a segfault in DataParallelTreeLearner. I ended up changing the split_info.hpp as stated above and building from source which fixes the issue but it would be great if this could be fixed in master. I tried to write a minimal example to reproduce the issue.
This makes the ray worker crash with the following log. Interestingly a warning is printed that categorical_feature is ignored, while actually it still seems to cause issues.
|
yeah, this error is caused by the above code。#6491 (comment) The question was introduced when lightgbm>=4.0。 |
@moming39 you seem to have a clear idea of where the bug is and what called it. Would you like to submit a pull request with a fix? |
microsoft#6491 (comment) This code is used for computing the buffer size of the communicating the split info the in distributed training. But lost the two most recently added parameters(right_sum_gradient_and_hessian, left_sum_gradient_and_hessian) size. When use parallel_tree_learner method with number of categories of one feature larger than 28 (default max_cat_to_onehot(32)-2*2), it will resut in an Segmentation fault.
Closing this based on the fix in #6738 and @shiyu1994 's approval there. This fix will go out in the next release of LightGBM. If you build the latest development version and find that it it still not fixed, please comment and we can re-open it. |
Uh oh!
There was an error while loading. Please reload this page.
LightGBM/src/treelearner/split_info.hpp
Line 56 in 5cd95a5
This code is used for computing the buffer size of the communicating the split info the in distributed training. But lost the two most recently added parameters(right_sum_gradient_and_hessian, left_sum_gradient_and_hessian) size.
When use parallel_tree_learner method with number of categories of one feature larger than 28 (default max_cat_to_onehot(32)-2*2), it will resut in an Segmentation fault.
This code must be,
The text was updated successfully, but these errors were encountered: