-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
BUG: Fix groupby duplicate column error message #8210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
this is issue #7511 |
data=[range(4), range(2,6), range(0, 8, 2)]) | ||
|
||
grouped = df.groupby('A') | ||
assert grouped.count().index.nlevels == 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to use self.assertTrue
rather than bare asserts
This needs to raise if instead of the specific check you are doing |
ok, i updated following your request
|
@@ -1922,6 +1922,8 @@ def __init__(self, index, grouper=None, obj=None, name=None, level=None, | |||
|
|||
# no level passed | |||
if not isinstance(self.grouper, (Series, Index, np.ndarray)): | |||
if getattr(self.grouper,'ndim', 1) != 1: | |||
raise AssertionError("Grouper result with an ndim != 1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make this a ValueError
groupings.append(ping) | ||
if isinstance(gpr, DataFrame) and gpr.ndim > 1: | ||
for name, gpr in gpr.iteritems(): | ||
ping = Grouping(group_axis, gpr, obj=obj, name=name, level=level, sort=sort) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this while section you added should be taken out. If their is a duplicate grouper then it should simply raise an error (can be even more informative if you'd like). also creates a multi-grouper which is really oddd (and in a weird order).
Well, if pandas allow dataframes to have duplicated columns, ability to group like this seems a pretty user-expectable consequence, to me. In fact, I'm not even sure of an easy (user-friendly) way to workaround that limitation without altering/duplicating the data... But anyway, i don't mind much about that feature, and it's trivial to put back in the future if needed. So the current code now focuses only on the error message thing, as requested. See last commit. Thanks |
@bthyreau ok will merge this thanks. In response to your comment, the ambiguity is that you are saying to pandas, hey I have multiple columns that I want to groupby on but they have the same NAME! (e.g. 'A','A') in your example. So the order if these is ambiguous. If you wanted to group this way you could
|
merged via 77d5f04 |
Ok Thanks
which, for the user point-of-view, is arguably no more helpful than before (#7511) If that was not intended, i can reopen or create a new pr; just tell me. ps. code was:
|
I took that out as it is a very specific case; better to have a general message. |
I understand, but then there is no way for the user to know what causes the error. |
sure how about showing tyoe(self.object) can't assume it would have a name |
closes #7511
Though having duplicated column names in a dataframe is never a good idea, it may happen, and that shouldn't confuse groupby() with a meaningless message. Currently
This patch fixes that.
Thanks.