-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
ENH: itertuples() returns namedtuples (closes #11269) #11325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
tests! take a frame of multiple dtypes, call itertuples on list, then assert that the first tuple generated has the correct fields, named correctly (and handles rename properly). need a test for python26 when renaming fails (but the tuple works), and where it fails completely. these should go with other tests of itertuples. |
@@ -641,7 +641,7 @@ def iterrows(self): | |||
|
|||
def itertuples(self, index=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, maybe accept a name
parameter (default to pandas
)
need some tests |
Added tests for |
Sorry for the noise, but I'm hitting |
The commit with updated tests was for some reason not pushed, trying again. |
|
||
if sys.version >= LooseVersion('2.7'): | ||
self.assertEqual(tup._fields, ('Index', '_1', '_2')) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a test for > 255 columns
Ok, thanks for your comments. I was quite happy to have a regular tuple fallback set up. |
|
||
""" | ||
arrays = [] | ||
if index: | ||
arrays.append(self.index) | ||
fields = ["Index"] + list(self.columns) | ||
else: | ||
fields = self.columns | ||
|
||
# use integer indexing because of possible duplicate column names | ||
arrays.extend(self.iloc[:, k] for k in range(len(self.columns))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can include this in the generator itself, rather than creating all at once (the arrays.extend
line)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean something along the lines of
arrays = []
fields = []
if index:
arrays.append(self.index)
fields.append("Index")
# use integer indexing because of possible duplicate column names
arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))
# `rename` is unsupported in Python 2.6
try:
itertuple = collections.namedtuple(name, fields+list(self.columns), rename=True)
except:
# fallback to regular tuples
return zip(*arrays)
return (itertuple(*row) for row in zip(*arrays))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep I think that makes it a generate all the way thru (easy way to test is to do a next on it in a timeit)
ok minor comment. |
need a whatsnew note. put in the API section |
ok, ping when green. |
Python 2 turned out to happily accept an almost infinite number of arguments, so i just set a hard limit of 255 fields. |
@jreback all done if you approve the last modification. |
|
||
# use integer indexing because of possible duplicate column names | ||
arrays.extend(self.iloc[:, k] for k in range(len(self.columns))) | ||
return zip(*arrays) | ||
|
||
if len(self.columns) + index <= 255: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need to do this explicity, just use the try/except, in the except
name the possible exceptions that you are catching (always better than a bare one if you can)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok then. pls add a comment to that effect (of why you are explicty using this max number) (and make it < 256
).
It's just that calling the method with a large number of fields on Python 2 was really slow. |
@mjoud i c. ok then. |
return zip(*arrays) | ||
|
||
return (itertuple(*row) for row in zip(*arrays)) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just do return .... here (don't need the else
)
@jreback ok fixed |
# fallback to regular tuples | ||
return zip(*arrays) | ||
|
||
return (itertuple(*row) for row in zip(*arrays)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, put the return (itertuple.....)
IN the try. then the except
can simply pass and you already have the zip(*arrays)
. much simpler
Thanks, better. I'm not exactly showing off here... |
ENH: itertuples() returns namedtuples (closes #11269)
@mjoud thanks! nice PR. |
closes #11269
This will make itertuples return namedtuples. I'm not sure about tests, here. Since
namedtuple
is a drop-in replacement for ordinary tuples (once they are created) I naively expect things to work.