-
-
Notifications
You must be signed in to change notification settings - Fork 19.6k
index is included in memory usage by default #11867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/base.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can put this in PandasObject I think
pandas/tests/test_frame.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test this on series/Index/Categorical as well. (for Index put a test in test_index on the Base which will run for all indexes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added onto the existing tests in IndexOps - lmk if that's not OK
c462b19 to
c99cda7
Compare
pandas/tests/test_base.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't use this
use tm.assert_almost_equal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we disabled the test routines that one shouldn't use - in tm.TestCase
but maybe that's an open issue (or if not can you create one)
we are dropping 2.6 soon - but still like for there to be 1 way to do testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - although the functionality is materially worse than the nosetests one (actually kinda peculiar with the bool argument for 3 vs 5):
check_less_precise : bool, default False
Specify comparison precision.
5 digits (False) or 3 digits (True) after decimal points are compared.
51de6c4 to
36d75d8
Compare
|
@jreback |
doc/source/whatsnew/v0.18.0.txt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually I think this should call with deep=True.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have enough context to know whether deep should be True or False by default, but I can see the logic behind the default for .memory_usage should being the same as that for sys.getsizeof, given the similar use cases.
Or why are the use cases different? Because a user calling from the system wants accuracy vs. speed but a user calling from pandas wants speed vs. accuracy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using __sizeof__ should have deep=True which gives the 'best' report of memory used. This is not the default for .memory_usage() because its expensive to compute (potentially). see #11595 where it can be somewhat slow (though the cython impl helps).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g. if someone is calling sys.getsizeof(df) then I think its appropriate to give the most accurate (if maybe somewhat non-performant) answer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, if it's a material performance drag and a different enough set of use cases, sobeit
3c02600 to
6045ae7
Compare
|
@jreback updated & green |
pandas/core/base.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nothing should raise a TypeError, what does? rather than catching this error, need to fix the underlying.
|
@jreback updated & green |
|
@MaximilianR lgtm. can you run: ping when green. |
|
Done (phew...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, though we usually do this with parens
|
lgtm. ping when green. |
pandas/tests/test_frame.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how did this pass before?
this will never be true. as deep=True is >> non-deep with object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that df isn't an object dtype, I don't think this was affected the results - I will fix regardless
|
merged via 60cacab thanks! |
...and sys.getsizeof returns correct value. Closes #11597
Is this the best implementation for a common method across classes?