Skip to content

ENH: add validate as a param to join (#46622) #46740

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Apr 27, 2022

Conversation

gaotian98
Copy link
Contributor

@gaotian98 gaotian98 commented Apr 11, 2022

@mroeschke mroeschke added Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Apr 12, 2022
@mroeschke mroeschke added this to the 1.5 milestone Apr 12, 2022
@gaotian98
Copy link
Contributor Author

Does it matter for the change to be merged if one code check fails? The failure seems to come from a file I never touched, while the previous two commits were able to pass all checks.

@mroeschke
Copy link
Member

Does it matter for the change to be merged if one code check fails?

This was a known failure affecting all builds. Just waiting on if other maintainers would like to review

@gaotian98
Copy link
Contributor Author

Does it matter for the change to be merged if one code check fails?

This was a known failure affecting all builds. Just waiting on if other maintainers would like to review

Sure. Thank you!

@@ -104,6 +107,126 @@ def test_suffix_on_list_join():
tm.assert_frame_equal(arr_joined, norm_joined)


def test_join_on_single_col_check_dup():
# GH 46622
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you either try parametrizing this or creating multiple tests? This is hard to debug if something goes wrong

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this test fails, it's easy to trace the issue with the line and corresponding comments in this function. I don't think of an easy way of parametrizing that does not make this messier. Could you double check?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple tests are fine too.

If the first fails all other checks are not executed. Also it is not easy to see how the df looks if a check further below fails

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just changed. Does it look better now? Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still one big test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry. Forgot to add test_join.py after resolving conflict with the upstream/main. Please check again

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Looks good to merge pending another look from @phofl

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm some comments, ping on green.

@jreback
Copy link
Contributor

jreback commented Apr 26, 2022

looks good, it might be worth adding something to https://pandas.pydata.org/pandas-docs/dev/user_guide/merging.html?highlight=join#joining-on-index (there is a lot of sections, so this is a maybe).

@jreback jreback merged commit 9ad8150 into pandas-dev:main Apr 27, 2022
@jreback
Copy link
Contributor

jreback commented Apr 27, 2022

thanks @gaotian98 very nice!

yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Join - Add a parameter to check for duplicates
4 participants