Skip to content

KeyError when using str.cat and index was changed #7857

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
toobaz opened this issue Jul 28, 2014 · 5 comments · Fixed by #7902
Closed

KeyError when using str.cat and index was changed #7857

toobaz opened this issue Jul 28, 2014 · 5 comments · Fixed by #7902
Labels
Bug Strings String extension data type and string data
Milestone

Comments

@toobaz
Copy link
Member

toobaz commented Jul 28, 2014

df = DataFrame(index=MultiIndex.from_product([[2011, 2012], [1,2,3]],
                                             names=['year', 'month']))

df = df.reset_index()

str_year = df.year.astype('str')
str_month = df.month.astype('str')
str_both = str_year.str.cat(str_month, sep=' ')

... so far, everything is fine. Now filter the index and retry:

df = df[df.month > 1]

str_year = df.year.astype('str')
str_month = df.month.astype('str')
str_both = str_year.str.cat(str_month, sep=' ')

... you will get a KeyError (tested against git, commit 90fa87e ):

KeyError                                  Traceback (most recent call last)
<ipython-input-12-9d3f1fbb70fc> in <module>()
     11 str_year = df.year.astype('str')
     12 str_month = df.month.astype('str')
---> 13 str_both = str_year.str.cat(str_month, sep=' ')

/home/pietro/nobackup/repo/pandas/pandas/core/strings.py in cat(self, others, sep, na_rep)
    933     @copy(str_cat)
    934     def cat(self, others=None, sep=None, na_rep=None):
--> 935         result = str_cat(self.series, others=others, sep=sep, na_rep=na_rep)
    936         return self._wrap_result(result)
    937 

/home/pietro/nobackup/repo/pandas/pandas/core/strings.py in str_cat(arr, others, sep, na_rep)
     41 
     42     if others is not None:
---> 43         arrays = _get_array_list(arr, others)
     44 
     45         n = _length_check(arrays)

/home/pietro/nobackup/repo/pandas/pandas/core/strings.py in _get_array_list(arr, others)
     13 
     14 def _get_array_list(arr, others):
---> 15     if len(others) and isinstance(others[0], (list, np.ndarray)):
     16         arrays = [arr] + list(others)
     17     else:

/home/pietro/nobackup/repo/pandas/pandas/core/series.py in __getitem__(self, key)
    491     def __getitem__(self, key):
    492         try:
--> 493             result = self.index.get_value(self, key)
    494 
    495             if not np.isscalar(result):

/home/pietro/nobackup/repo/pandas/pandas/core/index.py in get_value(self, series, key)
   1194 
   1195         try:
-> 1196             return self._engine.get_value(s, k)
   1197         except KeyError as e1:
   1198             if len(self) > 0 and self.inferred_type in ['integer','boolean']:

/home/pietro/nobackup/repo/pandas/pandas/index.so in pandas.index.IndexEngine.get_value (pandas/index.c:2991)()

/home/pietro/nobackup/repo/pandas/pandas/index.so in pandas.index.IndexEngine.get_value (pandas/index.c:2806)()

/home/pietro/nobackup/repo/pandas/pandas/index.so in pandas.index.IndexEngine.get_loc (pandas/index.c:3532)()

/home/pietro/nobackup/repo/pandas/pandas/hashtable.so in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:7033)()

/home/pietro/nobackup/repo/pandas/pandas/hashtable.so in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:6974)()

KeyError: 0
@jreback jreback added this to the 0.15.0 milestone Jul 28, 2014
@jreback
Copy link
Contributor

jreback commented Jul 28, 2014

this looks like a bug in the impl of core/strings/_get_array_list, which is checking the 0th element if its a list/ndarray. prob should be others.values[0]

care to do a PR?

@toobaz
Copy link
Member Author

toobaz commented Jul 28, 2014

Indeed, this fixes the issue (although I must admit I didn't exert much effort in understanding the code apart from that line).

@jreback
Copy link
Contributor

jreback commented Jul 28, 2014

hah....the problem is that since other is a series, others[0] selects the 0th element (which it doesn't have, hence the KeyError). It is trying to figure out whether the 0th element is actually a list or ndarray (rather than a scalar).

actually maybe this should be

_values_from_object(others)[0] because then others could be a ndarray OR a Series (I don't know if that's possible, but guards against it).

@toobaz
Copy link
Member Author

toobaz commented Jul 28, 2014

I am a bit lost (i understood the diagnosis, not the cure)... but will take a look again in the next days.

@toobaz
Copy link
Member Author

toobaz commented Aug 2, 2014

This seems to work in all cases I can conceive...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Strings String extension data type and string data
Projects
None yet
2 participants