Skip to content

groupby on NaN-only column gives IndexError #11016

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
marcelm opened this issue Sep 7, 2015 · 3 comments
Closed

groupby on NaN-only column gives IndexError #11016

marcelm opened this issue Sep 7, 2015 · 3 comments
Labels
Bug Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Milestone

Comments

@marcelm
Copy link

marcelm commented Sep 7, 2015

Using pandas from master (0.16.2+590.g81b647f) in Python 3.4.2, the following code gives an IndexError: index out of bounds:

import pandas as pd, numpy as np
df = pd.DataFrame(dict(a=[np.nan]*3, b=[1,2,3]))
g = df.groupby(('a', 'b'))
len(g)  # IndexError

The same problem occurs when calling list(g) instead. Since NaN values are skipped according to the documentation, I guess the correct answer would be zero for len(g) and an empty list for list(g).

Strangely, iteration works, so for x in g: pass (or [x for x in g]) does not give an error (and iterates zero times). Also, g.count(), g.sum() etc. work (and return an empty DataFrame).

To add to the confusion, g.groups gives the dictionary {(nan, 1): [0], (nan, 2): [1], (nan, 3): [2]}. Shouldn’t this be empty because group keys with NaNs are dropped?

Grouping only by column 'a' or 'b' works and results in a length of 0 or 3, respectively.

@jreback
Copy link
Contributor

jreback commented Sep 8, 2015

the .groups is correct, this is how things are recorded, all groups are kept. But the len raising is a bug.

@jreback
Copy link
Contributor

jreback commented Sep 9, 2015

closed by #11031

@jreback jreback closed this as completed Sep 9, 2015
@marcelm
Copy link
Author

marcelm commented Sep 9, 2015

Thanks a lot for this really quick bug fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

No branches or pull requests

2 participants