Skip to content

ENH: itertuples() returns namedtuples (closes #11269) #11325

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 28, 2015
Merged

ENH: itertuples() returns namedtuples (closes #11269) #11325

merged 1 commit into from
Oct 28, 2015

Conversation

mjoud
Copy link

@mjoud mjoud commented Oct 14, 2015

closes #11269

This will make itertuples return namedtuples. I'm not sure about tests, here. Since namedtuple is a drop-in replacement for ordinary tuples (once they are created) I naively expect things to work.

@jreback
Copy link
Contributor

jreback commented Oct 14, 2015

tests!

take a frame of multiple dtypes, call itertuples on list, then assert that the first tuple generated has the correct fields, named correctly (and handles rename properly).

need a test for python26 when renaming fails (but the tuple works), and where it fails completely.

these should go with other tests of itertuples.

@jreback jreback added Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode Compat pandas objects compatability with Numpy or Python functions labels Oct 16, 2015
@@ -641,7 +641,7 @@ def iterrows(self):

def itertuples(self, index=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, maybe accept a name parameter (default to pandas)

@jreback
Copy link
Contributor

jreback commented Oct 16, 2015

need some tests

@mjoud
Copy link
Author

mjoud commented Oct 20, 2015

Added tests for namedtuples and the name argument. Squashed.

@mjoud
Copy link
Author

mjoud commented Oct 20, 2015

Sorry for the noise, but I'm hitting SyntaxError: more than 255 arguments with a large number of columns. I've modified things to just return regular tuples via zip() on any exception, as things were before.

@mjoud
Copy link
Author

mjoud commented Oct 20, 2015

The commit with updated tests was for some reason not pushed, trying again.


if sys.version >= LooseVersion('2.7'):
self.assertEqual(tup._fields, ('Index', '_1', '_2'))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a test for > 255 columns

@jreback jreback added this to the 0.17.1 milestone Oct 20, 2015
@mjoud
Copy link
Author

mjoud commented Oct 20, 2015

Ok, thanks for your comments. I was quite happy to have a regular tuple fallback set up.


"""
arrays = []
if index:
arrays.append(self.index)
fields = ["Index"] + list(self.columns)
else:
fields = self.columns

# use integer indexing because of possible duplicate column names
arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can include this in the generator itself, rather than creating all at once (the arrays.extend line)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean something along the lines of

        arrays = []
        fields = []                                                     
        if index:                                                               
            arrays.append(self.index)                                           
            fields.append("Index")

        # use integer indexing because of possible duplicate column names       
        arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))        

        # `rename` is unsupported in Python 2.6                                 
        try:                                                                    
            itertuple = collections.namedtuple(name, fields+list(self.columns), rename=True)       
        except:                                                                 
            # fallback to regular tuples                                        
            return zip(*arrays)                                                 

        return (itertuple(*row) for row in zip(*arrays)) 

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep I think that makes it a generate all the way thru (easy way to test is to do a next on it in a timeit)

@jreback
Copy link
Contributor

jreback commented Oct 20, 2015

ok minor comment.

@jreback
Copy link
Contributor

jreback commented Oct 20, 2015

need a whatsnew note. put in the API section

@jreback
Copy link
Contributor

jreback commented Oct 20, 2015

ok, ping when green.

@mjoud
Copy link
Author

mjoud commented Oct 21, 2015

Python 2 turned out to happily accept an almost infinite number of arguments, so i just set a hard limit of 255 fields.

@mjoud
Copy link
Author

mjoud commented Oct 21, 2015

@jreback all done if you approve the last modification.


# use integer indexing because of possible duplicate column names
arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))
return zip(*arrays)

if len(self.columns) + index <= 255:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to do this explicity, just use the try/except, in the except name the possible exceptions that you are catching (always better than a bare one if you can)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok then. pls add a comment to that effect (of why you are explicty using this max number) (and make it < 256).

@mjoud
Copy link
Author

mjoud commented Oct 25, 2015

It's just that calling the method with a large number of fields on Python 2 was really slow.

@jreback
Copy link
Contributor

jreback commented Oct 25, 2015

@mjoud i c. ok then.

return zip(*arrays)

return (itertuple(*row) for row in zip(*arrays))
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just do return .... here (don't need the else)

@mjoud
Copy link
Author

mjoud commented Oct 26, 2015

@jreback ok fixed

# fallback to regular tuples
return zip(*arrays)

return (itertuple(*row) for row in zip(*arrays))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, put the return (itertuple.....) IN the try. then the except can simply pass and you already have the zip(*arrays). much simpler

@mjoud
Copy link
Author

mjoud commented Oct 27, 2015

Thanks, better. I'm not exactly showing off here...

jreback added a commit that referenced this pull request Oct 28, 2015
ENH: itertuples() returns namedtuples (closes #11269)
@jreback jreback merged commit 8a46de4 into pandas-dev:master Oct 28, 2015
@jreback
Copy link
Contributor

jreback commented Oct 28, 2015

@mjoud thanks!

nice PR.

@mjoud mjoud deleted the namedtuples branch October 28, 2015 11:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: make itertuples() return namedtuples
2 participants