-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
ENH: add validate as a param to join (#46622) #46740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Does it matter for the change to be merged if one code check fails? The failure seems to come from a file I never touched, while the previous two commits were able to pass all checks. |
This was a known failure affecting all builds. Just waiting on if other maintainers would like to review |
Sure. Thank you! |
@@ -104,6 +107,126 @@ def test_suffix_on_list_join(): | |||
tm.assert_frame_equal(arr_joined, norm_joined) | |||
|
|||
|
|||
def test_join_on_single_col_check_dup(): | |||
# GH 46622 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you either try parametrizing this or creating multiple tests? This is hard to debug if something goes wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this test fails, it's easy to trace the issue with the line and corresponding comments in this function. I don't think of an easy way of parametrizing that does not make this messier. Could you double check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple tests are fine too.
If the first fails all other checks are not executed. Also it is not easy to see how the df looks if a check further below fails
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just changed. Does it look better now? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still one big test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry. Forgot to add test_join.py after resolving conflict with the upstream/main. Please check again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Looks good to merge pending another look from @phofl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm some comments, ping on green.
looks good, it might be worth adding something to https://pandas.pydata.org/pandas-docs/dev/user_guide/merging.html?highlight=join#joining-on-index (there is a lot of sections, so this is a maybe). |
thanks @gaotian98 very nice! |
closes ENH: Join - Add a parameter to check for duplicates #46622
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added an entry in the latest
doc/source/whatsnew/v1.5.0.rst
file if fixing a bug or adding a new feature.